Remix.run Logo
Memory Subsystem Optimizations(johnnysswlab.com)
36 points by mfiguiere 6 hours ago | 7 comments
adsharma 2 hours ago | parent | next [-]

18 blog posts and very limited mention of NUMA and HT?

https://adsharma.github.io/more-performance-hints/

jeffbee 5 hours ago | parent | prev | next [-]

I find this site interesting because of its mixture of good topic choice and inaccurate details. I think it's generated by LLMs.

Specifically catching my eye in this collection of articles is the highly misleading one about huge pages. All recent Linux distributions have THP set to "madvise" by default. Many programs exploit THP automatically, including any Go program and any JVM program with a flag set. The tcmalloc shared library that comes with Ubuntu is probably the single worst way to experience huge pages. Mi-malloc is the better choice if you must preload a library, but there are even better choices. Explicit huge pages are little-used because managing them is annoying. Finally, latest Linux kernels have features called "folios"and "mTHP" that make THP even smoother.

kev009 3 hours ago | parent | next [-]

The huge page article is sequitur with official documentation like https://docs.redhat.com/en/documentation/red_hat_enterprise_.... THP can only issue up to 2MB pages on amd64 so it's not necessarily a silver bullet for large persistent consumers like a DB or GC language and worth knowing about the older methods.

To me they look like marketing posts, but they aren't void of effort or meaning as a quick intro to various topics.

hairband_dude 2 hours ago | parent | prev | next [-]

It's been around for a while: https://web.archive.org/web/20230602031306/https://johnnyssw.... Not sure if the newer articles are LLM/AI assisted though.

foltik 4 hours ago | parent | prev [-]

> Mi-malloc is the better choice if you must preload a library, but there are even better choices.

What’s a better choice?

jeffbee 4 hours ago | parent [-]

Linking the allocator into your program when you build it, instead of overriding just malloc and free at runtime. Then you can choose between jemalloc, mi-malloc, TCMalloc, or whatever you please, and get better features such as C++ sized delete. Rust makes this easy with for example "use tcmalloc_better::TCMalloc".

matu3ba 2 hours ago | parent | prev [-]

The blog looks nice, especially having simple to understand numbers. To me the memory subsystem articles are missing the more spicy pieces like platform semantics, barriers, de-virtualization (latter discussed in an article separate of the series). In the other articles I'd also expect debugging format trade-offs (DWARF vs ORC vs alternatives), virtualization performance and relocation effects briefly discussed, but could not find them. There are a few C++ article missing: 1. cache-friendly structures in C++, because standard std::map etc are unfortunately not written to be cache-friendly (only std::vector and std::deque<T> with high enough block_size), ideally with performance numbers, 2. what to use for destructive moves or how to roll your own (did not make it into c++26).