This is a good place to mention profile-guided optimization.

In PGO you profile your application running in the real world (via automatic instrumentation) and the compiler uses the output to make optimization decisions.

I'm not sure what exactly they use it for, but I imagine it could be used for loop unrolling, switch case ordering, branch prediction optimization, memory ordering and more.

▲

menaerus 4 days ago | parent [-]

What if your "application" is a server-side code hosting multitude of different types of workloads, e.g. databases? I was never really convinced by the PGO benefits for such open-ended and complex workloads.

▲

ltbarcly3 4 days ago | parent [-]

PGO operates at the cpu branch level. The main benefit, if i understand correctly, is to reduce branch mispredictions which lead to pipeline stalls.

Theres absolutely no reason to think that the specific workload of a system generally makes random/arbitrary/heuristic be the best way to pick a most likely branch for a given branching instruction.

For example, when scanning data you might check if the current value is equal to a test value, and branch on that to either code that handles the match, or else code that moves to the next entry to test that. This may be extremely likely to match (95%) as in a hash table, or very likely to not match (string search). A compiler can't tell which is true, and your workload literally has no impact on this unless it is pathological.

▲

menaerus 3 days ago | parent [-]

I don't follow. If I have an ingestion-heavy workload then my hash-map will certainly be stressed in a completely different way than if I have a read-only workload. The data that PGO collects during the former will be different than the data that it collects during the later, so, how is it not that the workload doesn't impact the PGO optimizations? Perhaps I misunderstood you.

▲

ltbarcly3 3 days ago | parent [-]

That is not how hashtables work. Hash tables effectively randomize keys into buckets. Whether there is a hash collision is independent of the workload, and only has to do with the number of entries in the hash table vs the threshold to resize it.

This is what I meant by pathological data. If you had weird data that somehow, over and over, got hash tables right up to their fill limit and just stayed there then you would have slightly more collisions than an average use case. However you would have to construct weird data to do this. It wouldn't be "more ingestion". The thing PGO optimized would be individual branches in the hash table code, so for example the branching instruction to see if the key in a bucket is == to the key you are looking for.

▲

menaerus 2 days ago | parent [-]

> Whether there is a hash collision is independent of the workload,

In read-only workload, searching through the key-value space there can't be a collision, no?

▲

ltbarcly3 2 days ago | parent [-]

I think you should read and understand how a hash table works before you have an opinion on how they work.

▲

menaerus 2 days ago | parent [-]

You're incredibly rude and also clueless but I'm nonetheless giving you a benefit of a doubt to understand if there is something that I'm missing. But you're right, this is now way over the top for your head.

▲

ltbarcly3 2 days ago | parent [-]

I'm sorry for being rude. To address your question, yes, in fact all hash table collisions happen during the search key matching phase, if you study the collision resolution strategies on the wikipedia page you will see this.

https://en.wikipedia.org/wiki/Hash_table#Collision_resolutio...

	▲	menaerus a day ago \| parent [-]
		Ok, I can see how I didn't manage to fully explain myself - by replying to your "collisions" argument I actually meant something more than that and while it seemed obvious in my head what I wanted to say, I can agree that my wording was imprecise. Sorry about that. What I pictured in my head is that the ingestion workloads in contrast to the read-only workloads are different in the fact that collisions in former can trigger hash-map resizing/rehashing and which is why I say that these are two distinctive workloads. Moving bunch of data to and from the buckets. So, my hypothesis is the following: if I PGO the implementation using the dataset (volume, data_distribution) that it triggers a handful of such resizing/rehashing operations, and we observe the positive improvements in wall-time runtime performance, what are the chances that using the same binary but now over a completely different dataset will preserve the same runtime performance improvements? What you're saying, if I understand correctly, is that the improvements will be the same regardless of the dataset (and even type of workload) and I am saying I am not so sure about that.