▲ | pron 2 days ago | ||||||||||||||||
> It’s the same as other layers we have like auto-vectorization where you don’t know if it’s working without performance analysis. The complexity compounds and reasoning about performance gets harder because the interactions get more complex with abstractions like these. Reasoning about performance is hard as it is, given nondeterministic optimisations by the CPU. Furthermore, a program that's optimal for one implementation of an Aarch64 architecture can be far from optimal for a different implementation of the same architecture. Because of that, reasoning deeply about micro-optimisations can be counterproductive, as your analysis today could be outdated tomorrow (or on a different vendor's chip). Full low-level control is helpful when you have full knowledge of the exact environment, including hardware details, and may be harmful otherwise. What is meant by "performance" is also subjective. Improving average performance and improving worst-case performance are not the same thing. Also, improving the performance of the most efficient program possible and improving the performance of the program you are likely to write given your budget aren't the same thing. For example, it may be the case that using a low-level language would yield a faster program given virtually unlimited resources, yet a higher-level language with less deterministic optimisation would yield a faster program if you have a more limited budget. Put another way, it may be cheaper to get to 100% of the maximal possible performance in language A, but cheaper to get to 97% with language B. If you don't need more than 97%, language B is the "faster language" from your perspective, as the programs you can actually afford to write will be faster. > Also, the more I work with this stuff the more I think trying to avoid memory management is foolish. It's not about avoiding thinking about memory management but about finding good memory management algorithms for your target definition of "good". Tracing garbage collectors offer a set of very attractive algorithms that aren't always easy to match (when it comes to throughput, at least, and in some situations even latency) and offer a knowb that allows you to trade footprint for speed. More manual memory management, as well as refcounting collectors often tend to miss the sweet spot, as they have a tendency for optimising for footprint over throughput. See this great talk about the RAM/CPU tradeoff - https://youtu.be/mLNFVNXbw7I from this year's ISMM (International Symposium on Memory Management); it focuses on tracing collectors, but the point applies to all memory management solutions. > Should have noted that Zig solves this by making the convention be to pass an allocator in to any function that allocates. So the boundaries/responsibilities become very clear. Yes, and arenas may give such usage patterns a similar CPU/RAM knob to tracing collectors, but this level of control isn't free. In the end you have to ask yourself if what you're gaining is worth the added effort. | |||||||||||||||||
▲ | rudedogg 2 days ago | parent [-] | ||||||||||||||||
I enjoy reading your comments here. Thanks for sharing your knowledge, I'll watch the talk. > Yes, and arenas may give such usage patterns a similar CPU/RAM knob to tracing collectors, but this level of control isn't free. In the end you have to ask yourself if what you're gaining is worth the added effort. For me using them has been very easy/convenient. My earlier attempts with Zig used alloc/defer free everywhere and it required a lot of thought to not make mistakes. But on my latest project I'm using arenas and it's much more straightforward. | |||||||||||||||||
|