Remix.run Logo
patrakov 8 hours ago

1. Thanks for partially (in paragraph 4 but not paragraph 5) preempting the obvious objection. Distinguishing between disk reads and writes is very important for consumer SSDs, and you quoted exactly the right metric in paragraph 4: reduction of writes, almost regardless of the total I/O. Reads without writes are tolerable. Writes stall everything badly.

2. The comparison in paragraph 4 is between no-swap and zswap, and the results are plausible. But the relevant comparison here is a three-way one, between no-swap, zram, and zswap.

3. It's important to tune earlyoom "properly" when using zram as the only swap. Setting the "-m" argument too low causes earlyoom to miss obvious overloads that thrash the disk through page cache and memory-mapped files. On the other hand, with earlyoom, I could not find the right balance between unexpected OOM kills and missing the brownouts, simply because, with earlyoomd, the usage levels of RAM and zram-based swap are the only signals available for a decision. Perhaps systemd-oomd will fare better. The article does mention the need for tuning the userspace OOM killer to an uncomfortable degree.

I have already tried zswap with a swap file on a bad SSD, but, admittedly, not together with earlyoomd. With an SSD that cannot support even 10 MB/s of synchronous writes, it browns out, while zram + earlyoomd can be tuned not to brown out (at the expense of OOM kills on a subjectively perfectly well performing system). I will try backing-store-less zswap when it's ready.

And I agree that, on an enterprise SSD like Micron 7450 PRO, zswap is the way to go - and I doubt that Meta uses consumer SSDs.