| ▲ | YesBox 4 days ago | |||||||||||||
While working on my game, I tried implementing a custom priority queue. It ended up being > 2x faster in debug build, but 2x-5x slower in the release build (??!!?) [1]. I haven't learned much about compilers/"lower level" C++, so I moved on at that point. How it worked: 1.) The P.Q. created a vector and resized it to known bounds. 2.) The P.Q. kept tract of and updated the "active sorting range" each time an element was inserted or popped. 2B.) So each time an element is added, it uses the closest unused vector element and updates the ".end" range to sort 2C.) Each time an element was removed, it updated the ".start" range 3.) In theory this should have saved reallocation overhead. [1] I believe Visual Studio uses -O0 for debug, and -O2 for release. | ||||||||||||||
| ▲ | adzm 4 days ago | parent | next [-] | |||||||||||||
These kinds of things are really interesting, and a good example of the importance of performance testing in optimized builds. I encountered something similar and realized the issue was different sizes of structures in debug vs release which ended up causing the CPU cache lines to overlap and invalidate. Though that sounds unlikely here. | ||||||||||||||
| ▲ | shihab 4 days ago | parent | prev | next [-] | |||||||||||||
I'm curious what you did with the "active sorting range" after a push/pop event. Since it's a vector underneath, I don't see any option other than to sort the entire range after each event, O(N). This would surely destroy performance, right? | ||||||||||||||
| ||||||||||||||
| ▲ | FuckButtons 4 days ago | parent | prev [-] | |||||||||||||
And this is why you go and look at the assembly in godbolt to see wtf is going on. | ||||||||||||||