| ▲ | About memory pressure, lock contention, and Data-oriented Design(mnt.io) | |||||||||||||
| 54 points by vinhnx 4 days ago | 5 comments | ||||||||||||||
| ▲ | asQuirreL 34 minutes ago | parent | next [-] | |||||||||||||
Post can be summarised quite succinctly: Everything was slow because sorting was taking a lot of time. Sorting was slow because its comparator was taking ~6 read locks on every comparison, and was cloning large structures to avoid holding the lock for a long time. The first fix was to access just the information needed to avoid the clones, the second fix was to cache exactly the data needed for sorting after the underlying data was updated, and use that for the comparators without needing to take the underlying lock. I'm looking forward to the next post about how cache consistency is tough. | ||||||||||||||
| ▲ | bob1029 2 hours ago | parent | prev | next [-] | |||||||||||||
> So, yes, it takes time to read from memory. That's why we try to avoid allocations as much as possible. This whole thing is summed up by some pretty basic physics. What you actually want to minimize is the communication of information between physical cores. Nothing else really matters. Certainly not terminology or clever tricks that effectively try to cheat thermodynamics. The cost of communicating information is almost always much more substantial than the cost of computing over that same information. The ALU is not nearly as power hungry as infinity fabric. | ||||||||||||||
| ||||||||||||||
| ▲ | pastescreenshot 6 hours ago | parent | prev [-] | |||||||||||||
The useful distinction here is not just AoS vs SoA, it is moving expensive work off the hot path. The biggest win in the article seems to be caching the sort/filter inputs so lock-taking and cache misses happen on updates, not during every comparison. That is a very transferable lesson even if you never go full data-oriented design. | ||||||||||||||