▲ | ltbarcly3 2 days ago | |
I'm sorry for being rude. To address your question, yes, in fact all hash table collisions happen during the search key matching phase, if you study the collision resolution strategies on the wikipedia page you will see this. https://en.wikipedia.org/wiki/Hash_table#Collision_resolutio... | ||
▲ | menaerus a day ago | parent [-] | |
Ok, I can see how I didn't manage to fully explain myself - by replying to your "collisions" argument I actually meant something more than that and while it seemed obvious in my head what I wanted to say, I can agree that my wording was imprecise. Sorry about that. What I pictured in my head is that the ingestion workloads in contrast to the read-only workloads are different in the fact that collisions in former can trigger hash-map resizing/rehashing and which is why I say that these are two distinctive workloads. Moving bunch of data to and from the buckets. So, my hypothesis is the following: if I PGO the implementation using the dataset (volume, data_distribution) that it triggers a handful of such resizing/rehashing operations, and we observe the positive improvements in wall-time runtime performance, what are the chances that using the same binary but now over a completely different dataset will preserve the same runtime performance improvements? What you're saying, if I understand correctly, is that the improvements will be the same regardless of the dataset (and even type of workload) and I am saying I am not so sure about that. |