Modernizing Linux swapping: introducing the swap table

▲ Modernizing Linux swapping: introducing the swap table(lwn.net)

76 points by chmaynard 15 hours ago | 113 comments

▲ chmaynard 4 hours ago | parent | next [-]

Second installment: https://news.ycombinator.com/item?id=46901662

▲ FooBarWidget 13 hours ago | parent | prev | next [-]

One pet peeve I have with virtual memory management on Linux is that, as memory usage approaches 100%, the kernel starts evicting executable pages because technically they're read-only and can be loaded from disk. Thus, the entire system grinds to a halt in a behavior that looks like swapping, because every program that wants to execute instructions has to load its instructions from disk again, only to have those instruction pages be evicted again when context switching to another program. This behavior is especially counter intuitive because disabling swap does not prevent this problem. There are no convenient settings for administrators for preventing this problem.

It's good that we have better swapping now, but I wish they'd address the above. I'd rather have programs getting OOMKilled or throwing errors before the system grinds to a halt, where I can't even ssh in and run 'ps'.

▲ Rygian 11 hours ago | parent | next [-]

I suffer from the same behavior, ever since I moved from Ubuntu to Debian.

An interactive system that does not interact (terminal not reactive, can't ssh in, screen does not refresh) is broken. I don't understand why this is not a kernel bug.

On my system, to add insult to injury, when the system does come back twenty minutes later, I get a "helpful" pop-up from the Linux Kernel saying "Memory Shortage Avoided". Which is just plain wrong. The pop-up should say "sorry, the kernel bricked your system for a solid twenty minutes for no good reason, please file a report".

▲ man8alexd 12 hours ago | parent | prev | next [-]

Actively used executable pages are explicitly excluded from reclaim. And if they are not used, why should they stay in memory when the memory is constrained? It is not the first time I have heard complaints about executable pages, but it seems to be some kind of common misunderstanding.

https://news.ycombinator.com/item?id=45369516

▲

FooBarWidget 9 hours ago | parent [-]

What is "actively used"? The bash session that I was using 2 seconds before system grinded to a halt sure didn't count.

▲

fulafel 7 hours ago | parent | next [-]

The set of possibly blocking operations for interactively using bash is big. Executable pages of the bash executable are far from the only thing that could be missing.

If the machine is swap trashing, all i/o goes to the same congested queue. .bash_history read or write access, memory allocation, stuff your terminal program does, stuff your wayland compositor or X11 stack does, bash accessing data in its memory that has been swapped out, etc etc. And each of those could be waiting for a while to issue their IO request since the IO system is flooded by swap IO.

There should be a tool that can show the interdependent graph of pending, blocking io operations.

▲

man8alexd 9 hours ago | parent | prev [-]

Your bash session is most likely still in memory but the system is spending 99.9% of time waiting for swap I/O, trying to free a few pages of memory. Swap random access latency is 10^3 slower than RAM.

▲

FooBarWidget 4 hours ago | parent [-]

The system has swap disabled.

	▲	man8alexd 4 hours ago \| parent [-]
		Then see the link in my previous comment.

▲ robinsonb5 13 hours ago | parent | prev | next [-]

Indeed. I think what's really needed is some way to mark pages as "required for interactivity" so that nothing related to the user interface gets paged out, ever. That, I think, would go at least some way towards restoring the feeling of "having a computer's full attention" that we had thirty years ago.

▲

akdev1l 12 hours ago | parent | next [-]

Seems the applications can call mlockall() to do this

▲

direwolf20 10 hours ago | parent | prev | next [-]

An Electron app would mark its entire 2GB as required for interactivity. If you run 4 electron apps on an 8GB system you run out of memory.

▲

robinsonb5 10 hours ago | parent [-]

I don't mean interactivity within apps, per se - I mean the desktop and underlying OS, so that if an electron app goes unresponsive and eats all the free RAM the window manager can still kill it. Or you can still open a new terminal window, log in and kill it. Right now it can take several minutes to get a Linux system back under control once a swapstorm starts.

▲

SAI_Peregrinus 6 hours ago | parent | next [-]

Linux doesn't really have any distinction between the desktop and underlying OS components in userspace and anything else in userspace. Linux is quite userland-agnostic, and distros have traditionally mixed user software with distro-managed software. You shouldn't use `sudo` to install software by default, your package manager should allow installing software for just your user. Software installed for the system could then be the only software allowed to mark itself as required for interactivity. You could do that manually to other software if you had root access, but "normal" user software installs with the package manager couldn't do so since they wouldn't get root access.

That'd require some new capabilities added, and some substantial shifts in how distro maintainers & users operate, so it's extremely unlikely. It's much closer to how things like Android operate, though still not quite as secure as giving each application its own user & dedicated storage for data.

▲

M95D 10 hours ago | parent | prev [-]

Alt+[SysRq,f]

Or Alt+[SysRq,h] for help

	▲	dingaling 15 minutes ago \| parent [-]
		No effect, captain. In 30 years of using desktop Linux I've never been able to interrupt a swapstorm. The only way out is long-press the power button.

▲

FooBarWidget 9 hours ago | parent | prev [-]

There is, mlock() or mlockall(), but it requires developer support. I wish there is an administrator knob that allows me to mark whole processes without needing to modify them.

	▲	man8alexd 8 hours ago \| parent [-]
		There is cgroup memory.min

▲ nolist_policy 12 hours ago | parent | prev | next [-]

Linux swap has been fixed on Chromebooks for years thanks to MGLRU. It's upstream since Linux 6.1 and you can try it with

  echo y >/sys/kernel/mm/lru_gen/enabled

	▲	tremon 11 hours ago \| parent \| next [-]
		Documentation links: https://docs.kernel.org/next/admin-guide/mm/multigen_lru.htm... https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/lin...
	▲	M95D 9 hours ago \| parent \| prev [-]
		I had nothing but problems since that was introduced in 6.1. It seems that the kernel prefers to compact/defrag memory, repeatedly, each time freezing everything 1-2 seconds, rather than releasing some disk cache memory or swapping out.

▲ 112233 12 hours ago | parent | prev | next [-]

Is there a way to make linux kernel schedule in a "batch friendly way"? Say I do "make -j" and get 200 gcc processes diong jobserver LTO link with 2GB RSS each. In my head, optimal way through such mess is get as many processes as can fit into RAM without swapping, run them to completion, and schedule additional processes as resources become available. A depth first, "infinite latency" mode.

Any combination of cgroups, /proc flags and other forbidden knobs to get such behaviour?

▲

Neywiny 10 hours ago | parent | next [-]

"make -j" has OOMd me more than it's worth. If it's a big project I just put in how many threads I want. I do hear your point but that is a solved problem.

	▲	112233 9 hours ago \| parent [-]
		actually, global jobserver is another unsolved thing that seems unvelievable nobody has done yet. You have server. Server spins N containers (kubes, dockers, multiple user sessions ...), each of them is building something. There is no mechanism to run batch of tasks in parallel in a way that uses available cores. Some special cases (make/ninja/gcc) work, but no general mechanism I know of

▲

direwolf20 10 hours ago | parent | prev [-]

It's not possible for the kernel to predict the memory needs of a process unfortunately

▲

112233 9 hours ago | parent | next [-]

But how about not scheduling swapped out processes if there currently is no free ram for their current RSS? of course kernel cannot know that a new process will balloon to eat all RAM, but once it has done so, is there a way to let it run to completion without being swapped out to "improve responsivity"?

	▲	man8alexd 8 hours ago \| parent [-]
		There is no actual swapping in the modern kernels. Nowadays, it is paging, when the kernel pages out individual unused memory pages, not entire processes, so it keeps all non-blocked processes running, but only necessary memory pages in the memory.

▲

man8alexd 10 hours ago | parent | prev [-]

It is possible to measure process memory utilitsation and set appropriate cgroup limits.

▲ rustyhancock 10 hours ago | parent | prev | next [-]

This explains some of the issues I was having on a laptop some months back.

And searching desperately for "just kill the damn thing" option.

	▲	garaetjjte 4 hours ago \| parent [-]
		If you enable Magic SysRq you can invoke OOM-killer from keyboard. https://docs.kernel.org/admin-guide/sysrq.html

▲ worldsavior 12 hours ago | parent | prev | next [-]

Program instructions size is small thus loading is fast, so no need to worry about that too much. I'd look on different things first.

▲

twic 12 hours ago | parent [-]

Have you measured this, or is this just an opinion?

	▲	man8alexd 12 hours ago \| parent [-]
		Look into /proc/<PID>/status and /proc/<PID>/smaps

▲ AtlasBarfed 7 hours ago | parent | prev [-]

Is it as bad with ssd?

▲ ChocolateGod 12 hours ago | parent | prev | next [-]

I'd like to see Linux gain support for actual memory compression, without the need to go through zram, similar to macOS/Windows.

▲

homebrewer 11 hours ago | parent | next [-]

zram has been "obsolete" for years, I don't know why people still reach for it. Linux supports proper memory compression in the form of zswap

https://wiki.archlinux.org/title/Zswap

▲

8 hours ago | parent | next [-]

[deleted]

▲

RealStickman_ 10 hours ago | parent | prev | next [-]

I didn't realize zswap also uses in-memory compression. It might be a combination of poor naming and zram being continuously popular.

▲

rascul 8 hours ago | parent | prev | next [-]

It is not obsolete. It's also useful for other things.

▲

ChocolateGod 11 hours ago | parent | prev | next [-]

Because I'd rather compress ram when running low on memory rather than swapping to my disks. zram is also default on some distros (e.g. Fedora).

	▲	homebrewer 11 hours ago \| parent [-]
		Did you read the link? Additional disk swap is optional, and if for some reason you would still like to have one, it's easy to disable writeback, using just the RAM. And even if one enables zswap and configures nothing else, compressing RAM and only swapping out to disk under extreme pressure is still the default behavior.

▲

8 hours ago | parent | prev [-]

[deleted]

▲

JamesTRexx 11 hours ago | parent | prev | next [-]

I use zswap, which is a non-fixed intermediate layer between RAM and swap and worked great on my old laptop which had a max of 4GB RAM. Even use it now on my current 32GB laptop.

Full compression would be nicer, but I'd also like to see ECC emulation (or alternative) as a cheaper alternative to the real hardware, although with current prices that might be less so.

▲

mkurz 11 hours ago | parent | prev [-]

zswap?

	▲	10 hours ago \| parent [-]
		[deleted]

▲ dist-epoch 12 hours ago | parent | prev | next [-]

Both Canonical and Microsoft recommend enabling swap file for Ubuntu cloud images, even if you allocate plenty of RAM to the VM.

Any thoughts on that?

▲

man8alexd 12 hours ago | parent | next [-]

https://chrisdown.name/2018/01/02/in-defence-of-swap.html

▲

AtlasBarfed 7 hours ago | parent [-]

I get we want a magical OS that reads minds and abstracts itself from users, but these days low mem situations are technical events for people trying to optimize resources use and balance uptime.

So not being able to mark apps processes as kill me first and leave others like ssh bash up is a missing feature.

Shouldn't the os have a bunch of auto hooks to invoke for machines under duress? Yes you might be able to do it in userspace but .. userspace will probably get unpredictability nuked in stress situations.

The WMs should have a "freeze all apps desktop except this one shell window" mode. Didn't NeXT have that?

Having some swap basically required has always seemed like a smell. A legacy from the 640k oops wasn't enough days. I can see emergency memory swap system as a feature, which it currently isn't...

	▲	man8alexd 7 hours ago \| parent [-]
		> So not being able to mark apps processes as kill me first and leave others like ssh bash up is a missing feature. systemd OOMScoreAdjust has existed for at least a decade. /proc/*/oom_adj since Linux kernel 2.6.11 - two decades. cgroups are also two decades old. SSH by default has OOMScoreAdjust=-1000 and is entirely protected from OOM.

▲

secondcoming 10 hours ago | parent | prev | next [-]

would that mean swapping involves network calls?

▲

IshKebab 11 hours ago | parent | prev [-]

Yeah because Linux's memory management is quite poor and running out of RAM without swap will often mean a hard reboot. Swap definitely helps a lot, even if it doesn't fully solve the problem.

To be honest I don't know why it's such an issue on Linux. Mac and Windows don't have this issue at all. Windows presumably because it doesn't over-commit memory. I'm not sure why Mac is so much better than Linux at memory management.

My eventual solution was to just buy a PC with a ton of RAM (128 GB). Haven't had any hard reboots due to OOM since then!

▲

magicalhippo 10 hours ago | parent | next [-]

> To be honest I don't know why it's such an issue on Linux.

edit: I wrote all this before realizing I overlooked that you answered it yourself, so below is my very elaborate explanation of what you said:

> Windows presumably because it doesn't over-commit memory.

I'm no expert but from what I've gathered this ultimately boils down to how Linux went with fork for multiprocessing, vs Windows focused on threads.

With fork, you clone the process. Since it's a clone it gets a copy of all the memory of the parent process. To make fork faster and consume less physical memory, Linux went with copy-on-write for the process' memory. This avoids an expensive copy, and also avoids duplicating memory which will only be read.

The downside is that Linux has no idea how much of the memory shared with the clone that the clone or the parent will modify after the fork call. If the clone just does a small job and exits quickly, neither it or the parent will modify a lot of pages, thus most of them are never actually copied. The fastest work is the work you never perform, so this is indeed fast.

However, in some cases the clone is long-lived and thus a lot of memory might eventually end up getting copied. Well, Linux needs to back those copies with physical memory, and so if there's not enough physical memory around it has to evict something. While Linux scrambles to perform the copy, the process which triggers it has to wait.

AFAIK one can configure Linux to reserve physical memory for a worst-case scenario where it has to copy all the cloned memory. However in almost all normal cases, this grossly overestimates the required memory and thus leads to swapping when technically it is not needed.

On Windows this is very different. Instead of spawning a cloned process to do extra work, you spawn a thread. And all threads belonging to a process shares the same memory. Thus there is no need to clone memory, no need for the copy-on-write optimization, and thus Windows has much better knowledge about how much free physical memory it actually has to work with.

Of course a thread on Windows can still allocate a huge amount of memory and trigger swapping that way, but Windows will never suddenly be in a situation where it then also needs to scramble to copy some shared pages.

▲

man8alexd 9 hours ago | parent [-]

> However in almost all normal cases, this grossly overestimates the required memory and thus leads to swapping when technically it is not needed.

This is not true. Disabling overcommit doesn't change reclaim and swapping behaviour and doesn't lead to unnecessary swapping.

	▲	magicalhippo an hour ago \| parent [-]
		> This is not true. Yeah that wasn't correct. It will however cause the kernel to refuse memory allocations[1] which could have been allowed, and a lot of programs don't handle that gracefully. [1]: https://www.kernel.org/doc/html/v6.13/mm/overcommit-accounti...

▲

direwolf20 10 hours ago | parent | prev | next [-]

My experience is different. Running out of RAM without swap will cause the most memory–hungry process to die, whereupon systemd restarts it. Running out of RAM with swap causes thrashing and you can't serve any requests or ssh logins. Someone has to press the reset button then.

	▲	rustyhancock 10 hours ago \| parent [-]
		I suppose there are two common scenarios roughly. If I did something (like try and decompress an archive) and I run out of memory I want that process to be killed. If my system/config is simply not up to scratch and the normal services are causing thrashing that needs to be addressed directly and OOM kill isn't intended to help I don't think.

▲

NekkoDroid 10 hours ago | parent | prev [-]

> To be honest I don't know why it's such an issue on Linux. Mac and Windows don't have this issue at all. Windows presumably because it doesn't over-commit memory

To be fair, my Windows system grinds to a halt (not really, but it becomes very noticably less responsive in basically anything) when JetBrains is installing an update (mind you I only have SSDs with all JetBrains stuff being on an NVMe). I don't know what JetBrains is doing, but it consistently makes itself noticable when it is updating.

	▲	IshKebab 5 hours ago \| parent [-]
		I have had this happen in the past (not very often though), and another saving grave of Windows is you can press ctrl-alt-del, which somehow seems to pause the rest of the system activity, and then see a process list and choose which one to kill. Linux doesn't have anything like that. KDE seems to have a somewhat functional Ctrl-alt-del menu - I have been able to access it when the rest of the shell gets screwed up (not due to OOM). But inexplicably the only options it has are Sleep, Restart, Shutdown or Log out!! Where is the "emergency shell", or "process manager" or even "run a program"? Ridiculous. I think Linux GUIs often have this weird fetish with designing as if nothing will ever go wrong, which is clearly not how the real world works. Especially on Linux. I've genuinely heard people claim that most Linux users will never need to use a terminal for example.

▲ iberator 13 hours ago | parent | prev [-]

Another useless feature into Linux kernel. Who uses swap space nowadays?! Last time I used swap on Linux device was around Pentium 2 era but in reality closer to 486DX era

▲ Titan2189 13 hours ago | parent | next [-]

We use it in production. Workloads with unpredictable memory usage (32Mb to 4Gb per process), but we also want to start enough processes to saturate the CPU. Before we configured & enabled swap we were either sitting at low CPU utilisation or OOM

▲ ch_123 12 hours ago | parent | prev | next [-]

I ran Linux without swap for some years on a laptop with a large-for-the-time amount of RAM (about 8GB). It _mostly_ worked, but sudden spikes of memory usage would render the system unresponsive. Usually it would recover, but it in some cases it required a power cycle.

Similarly, on a server where you might expect most of the physical memory to get used, it ends up being very important for stability. Think of VM or container hosts in particular.

▲ GCUMstlyHarmls 12 hours ago | parent | next [-]

I dont get why anti-swap is so prevalent in Linux discussions. Like, what does it hurt to stick 8-16-32gb extra "oh fuck" space on your drive.

Either you're going to never exhaust your system ram, so it doesn't matter, minimally exhaust it and swap in some peak load but at least nothing goes down, or exhaust it all and start having things get OOM'd which feels bad to me.

Am I out of touch? Surely it's the children who are wrong.

▲ manuel_w 12 hours ago | parent | next [-]

The pro-swap stance has never made sense to me because it feels like a logical loop.

There’s a common rule of thumb that says you should have swap space equal to some multiple of your RAM.

For instance, if I have 8 GB of RAM, people recommend adding 8 GB of swap. But since I like having plenty of memory, I install 16 GB of RAM instead—and yet, people still tell me to use swap. Why? At that point, I already have the same total memory as those with 8 GB of RAM and 8 GB of swap combined.

Then, if I upgrade to 24 GB of RAM, the advice doesn’t change—they still insist on enabling swap. I could install an absurd amount of RAM, and people would still tell me to set up swap space.

It seems that for some, using swap has become dogma. I just don’t see the reasoning. Memory is limited either way; whether it’s RAM or RAM + swap, the total available space is what really matters. So why insist on swap for its own sake?

▲ viraptor 12 hours ago | parent | next [-]

You're mashing together two groups. One claims having swap is good actually. The other claims you need N times ram for swap. They're not the same group.

> Memory is limited either way; whether it’s RAM or RAM + swap

For two reasons: usage spikes and actually having more usable memory. There's lots of unused pages on a typical system. You get free ram for the price of cheap storage, so why wouldn't you?

▲ man8alexd 12 hours ago | parent | prev | next [-]

This rule of thumb is outdated by two decades.

The proper rule of thumb is to make the swap large enough to keep all inactive anonymous pages after the workload has stabilized, but not too large to cause swap thrashing and a delayed OOM kill if a fast memory leak happens.

▲ tremon 12 hours ago | parent [-]

That's not useful as a rule of thumb, since you can't know the size of "all inactive anonymous pages" without doing extensive runtime analysis of the system under consideration. That's pretty much the opposite of what a rule of thumb is for.

▲ man8alexd 11 hours ago | parent [-]

You are right, it is not a rule of thumb, and you can't determine optimal swap size right away. But you don't need "extensive runtime analysis". Start with a small swap - a few hundred megabytes (assuming the system has GBs of RAM). Check its utilization periodically. If it is full, add a few hundred megabytes more. That's all.

▲ ZoomZoomZoom 10 hours ago | parent [-]

It's not like it's easy to shuffle partitions around. Swap files are a pain, so you need to reserve space at the end of the table. By the time you need to increase swap the previous partition is going to be full.

Better overcommit right away and live with the feeling you're wasting space.

	▲	rascul 8 hours ago \| parent \| next [-]
		> Swap files are a pain Easier than partitions: `mkswap --size 2G --file swap.img swapon swap.img`
	▲	man8alexd 10 hours ago \| parent \| prev \| next [-]
		Exactly opposite. Don't use swap partitions, and use swap files, even multiple if necessary. Never allocate too much swap space. It is better to get OOM earlier then to wait for unresponsive system.
	▲	direwolf20 10 hours ago \| parent \| prev [-]
		Hast thou discovered our lord and savior LVM?

▲ dspillett 11 hours ago | parent | prev | next [-]

> There’s a common rule of thumb that says you should have swap space equal to some multiple of your RAM.

That rule came about when RAM was measured in a couple of MB rather than GB, and hasn't made sense for a long time in most circumstances (if you are paging our a few GB of stuff on spinning drives your system is likely to be stalling so hard due to disk thrashing that you hit the power switch, and on SSDs you are not-so-slowly killing them due to the excess writing).

That doesn't mean it isn't still a good idea to have a little allocated just-in-case. And as RAM prices soar while IO throughput & latency are low, we may see larger Swap/RAM ratios being useful again as RAM sizes are constrained by working-sets aren't getting any smaller.

In a theoretical ideal computer, which the actual designs we have are leaky-abstraction laden implementations of, things are the other way around: all the online storage is your active memory and RAM is just the first level of cache. That ideal hasn't historically ended up being what we have because the disparities in speed & latency between other online storage and RAM have been so high (several orders of magnitude), fast RAM has been volatile, and hardware & software designs or not stable & correct enough such that regular complete state resets are necessary.

> Why? At that point, I already have the same total memory as those with 8 GB of RAM and 8 GB of swap combined.

Because your need for fast immediate storage has increased, so 8-quick-8-slow is no longer sufficient. You are right in that this doesn't mean you need 16-quick-16-slow is sensible, and 128-quick-128-slow would be ridiculous. But no swap at all doesn't make sense either: on your machine imbued with silly amounts of RAM are you really going to miss a few GB of space allocated just-in-case? When it could be the difference between slower operation for a short while and some thing(s) getting OOM-killed?

▲

man8alexd 11 hours ago | parent [-]

Swap is not a replacement for RAM. It is not just slow. It is very-very-very slow. Even SSDs are 10^3 slower at random access with small 4K blocks. Swap is for allocated but unused memory. If the system tries to use swap as active memory, it is going to become unresponsive very quickly - 0.1% memory excess causes a 2x degradation, 1% - 10x degradation, 10% - 100x degradation.

▲

AtlasBarfed 7 hours ago | parent [-]

What is allocated but unused memory? That sounds like memory that will be used in the near future and we are scheduling in an annoying disk load when it is needed

You are of course highlighting the problem that virtual addressing was intended to over abstract memory resource usage, but it provides poor facilities for power users to finely prioritize memory usage.

The example of this is game consoles, which didn't have this layer. Game writers had to reserve parts of ram fur specific uses.

You can't do this easily in Linux afaik, because it is forcing the model upon you.

	▲	man8alexd 6 hours ago \| parent [-]
		Unused or Inactive memory is memory that hasn't been accessed recently. The kernel maintains LRU (least recently used) lists for most of its memory pages. The kernel memory management works on the assumption that the least recently used pages are least likely to be accessed soon. Under memory pressure, when the kernel needs to free some memory pages, it swaps out pages at the tail of the inactive anonymous LRU. Cgroup limits and OOM scores allow to prioritize memory usage on a per-process and per-process group basis. madvise(2) syscall allows to prioritize memory usage within a process.

▲ xorcist 9 hours ago | parent | prev | next [-]

There is too much focus in this discussion about low memory situations. You want to avoid those as much as possible. Set reasonable ulimit for your applications.

The reason you want swap is because everything in the Linux (and all of UNIX really) is written with virtual memory in mind. Everything from applications to schedulers will have that use case in mind. That's the short answer.

Memory is expensive and storage is cheap. Even if you have 16 GB RAM in your box, and perhaps especially then, you will have some unused pages. Paging out those and utilizing more memory to buffer I/O will give you higher performance under most normal circumstances. So having a little bit of swap should help performance.

For laptops hibernation can be useful too.

▲ Balinares 11 hours ago | parent | prev | next [-]

Another factor other commenters haven't mentioned, although the article does bring it up: you may disable swap and you will still get paging behavior regardless, because in a pinch the kernel will reclaim pages that are mmapped to files. Most typically binaries and librairies. Which means the process in question will incur a map page read next time it schedules. But of course you're out of memory, so the kernel will need to page out another process's code page to make room, and when that process next schedules... Etc.

This has far worse degradation behavior than normal swapping of regular data pages. That at least gives you the breathing space to still schedule processes when under memory pressure, such as whichever OOM killer you favor.

	▲	man8alexd 11 hours ago \| parent [-]
		Binaries and libraries are not paged out. Being read-only, they are simply discarded from the memory. And I'll repeat, actively used executable pages are explicitly excluded from reclaim and never discarded.

▲ t-3 11 hours ago | parent | prev | next [-]

The reason you're supposed to have swap equal in size to your RAM is so that you can hibernate, not to make things faster. You can easily get away with far less than that because swap is rarely needed.

▲

dspillett 11 hours ago | parent | next [-]

> so that you can hibernate

The “paging space needs to be X*RAM” and “paging space needs to be RAM+Y” predate hibernate being a common thing (even a thing at all), with hibernate being an extra use for that paging space not the reason it is there in the first place. Some OSs have hibernate space allocated separately from paging/swap space.

▲

Balinares 11 hours ago | parent | prev [-]

I do wish there was a way to reserve swap spaces for hibernation that don't contribute to the virtual memory. Else by construction the hibernation space is not sufficient for the entire virtual memory space, and hibernation will fail when the virtual memory is getting full.

	▲	em-bee 8 hours ago \| parent [-]
		this. i don't even want swap for my apps. they allocate to much memory as it is. i'd rather they be killed when the memory runs out or simply be prevented from allocating memory that's not there. the kind of apps that can be safely swapped out are rarely using much memory anyways. but i do want hibernate to work.

▲ ch_123 12 hours ago | parent | prev [-]

You're implying that people are telling you to set up swap without any reason, when in fact there are good reasons - namely dealing with memory pressure. Maybe you could fit so much RAM into your computer that you never hit pressure - but why would you do that vs allocating a few GB of disk space for swap?

Also, as has been pointed out by another commenter, 8GB of swap for a system with 8GB of physical memory is overkill.

▲

tremon 12 hours ago | parent [-]

I'm also in the GP's camp; RAM is for volatile data, disk is for data persistence. The first "why would you do that" that needs to be addressed is why volatile data should be written to disk. And "it's just a few % of your disk" is not a sufficient answer to that question.

▲

112233 11 hours ago | parent | next [-]

> RAM is for volatile data, disk is for data persistence.

Genuinely curious where this idea has come from. Is it something being taught currently?

▲

tremon 8 hours ago | parent [-]

No, not currently -- since the start of computers. This is quite literally part of Computing 101; see https://web.stanford.edu/class/cs101/lecture02.html#/9 , slides 10-12.

You can ask your favourite search engine or language fabricator about the differences between RAM and disk storage, they will all tell you the same thing. Frankly, it's kind of astonishing that this needs to be explained on a site like HN.

	▲	112233 7 hours ago \| parent [-]
		I have no idea where on those slides it says non-volatile storage should not be used for non-permanent, temporary data. It does note main differences (speed, latency, permanence). How does that limit what data disk can be used for? What would one use optane DIMMs for? Also, if my program requires huge working set to process the data, why would I spend the effort and implement my own paging to templrary working files, instead of allocating ridiculous amount of memory and letting OS manage it for me? What is the benefit?

▲

ch_123 11 hours ago | parent | prev [-]

Because of cost - particularly given the current state of the RAM market. In order to have so much memory that you never hit memory spikes, you will deliberately need to buy RAM to never be used.

Note that simply buying more RAM than what you expect to use is not going to help. Going back to my post from earlier, I had a laptop with 8GB of RAM at a time where I would usually only need about 2-4GB of RAM for even relatively heavy usage. However, every once in a while, I would run something that would spike memory usage and make the system unresponsive. While I have much more than 8GB nowadays, I'm not convinced that it's enough to have completely outrun the risk of this sort of behaviour re-occuring.

	▲	em-bee 7 hours ago \| parent [-]
		how much swap do you have? i have 16GB now, and 16GB ram. i had a machine before with 48GB ram. obviously having more ram and no swap should perform better than the same amount of memory split into ram and swap.

▲ man8alexd 12 hours ago | parent | prev | next [-]

8-16-32gb of swap space without cgroup limits would get the system into swap thrashing and make it unresponsive.

▲ ch_123 12 hours ago | parent | prev | next [-]

I think it's some kind of misplaced desire to be "lightweight" and avoid allocating disk space that cannot be used for regular storage. My motivation way back when for wanting to avoid swap was due to concerns about SSD wear issues, but those have been solved for a long time ago.

▲ direwolf20 10 hours ago | parent | prev | next [-]

Swap causes thrashing, making the whole system unusable, instead of a clean OOM kill

▲

NekkoDroid 10 hours ago | parent | next [-]

IMO OOM killing should be reserved for single processes misbehaving. When a lot of different applications just use a decent amount of memory and exhaust the system RAM swapping to disk is the appropriate thing to do.

	▲	man8alexd 10 hours ago \| parent [-]
		When you set cgroup limits, you tell the kernel how to determine when a process is misbehaving and needs to be OOM-killed.

▲

man8alexd 10 hours ago | parent | prev [-]

swap causes thrashing if you have too large swap and no cgroup limits.

▲ AtlasBarfed 7 hours ago | parent | prev [-]

1) in the Microsoft days I would have a lot of available ram, bur windows still would aggressively swap, and I would get enraged when changing to an app that would have to swap in while I had 4gb of memory free

2) the os tried to be magical, but a swap thrash is still crap... I would much rather oom kill apps than swap thrash. For a desktop user: kill the fucking browser or electron apps, don't freeze the system/ui.

▲ solstice 12 hours ago | parent | prev [-]

I had a similar experience with Kubuntu on a xps13 from 2016 with only 8GB of RAM and the system suddenly freezing so hard that a hard reboot was required. While looking for the cause, I noticed that the system had only 250 MB of swap space. After increasing that to 10 GB there have been no further instances of freezing so far.

▲ wongarsu 12 hours ago | parent | prev | next [-]

It's unloved on Linux because using Linux under memory pressure sucks. But that's not a good reason to abandon improvements. Even more so with the direction RAM prices are headed

▲

man8alexd 12 hours ago | parent | next [-]

It sucks without proper cgroup limits because swap makes OOM slower to trigger. Either set the cgroup limits or make the swap small.

▲

ChocolateGod 12 hours ago | parent [-]

This requires additional setup from the user, the default setup should just "work".

	▲	man8alexd 11 hours ago \| parent [-]
		There are different definitions of "just work".

▲

gf000 11 hours ago | parent | prev [-]

It sucks for interactive use only. It could be solved in user space (see the other comment with cgroups), it just isn't.

▲ SCdF 12 hours ago | parent | prev | next [-]

You should still use swap. It's not "2x RAM" as advice anymore, and hasn't been for years: https://chrisdown.name/2018/01/02/in-defence-of-swap.html

tl;dr; give it 4-8GB and forget about it.

▲

ch_123 12 hours ago | parent [-]

I've heard "square root of physical memory" as a heuristic, although in practice I use less than this with some of my larger systems.

▲

man8alexd 12 hours ago | parent [-]

▲

boomlinde 11 hours ago | parent [-]

That's not so much a rule of thumb as an assessment you can only make after thorough experimentation or careful analysis.

▲

NoGravitas 6 hours ago | parent | next [-]

It doesn't take that much experimentation, though. Either set up not enough swap and keep increasing it by a little bit until you stop needing to increase it, or set up too much, and monitor your max use for a while (days/weeks), and then decrease it to a little more than the max you used.

	▲	SAI_Peregrinus 5 hours ago \| parent [-]
		I went with "set up 0 swap" and then never needed to increase it. I built my PC in 2023, when RAM prices were still reasonable, stuck 128GiB of ECC DDR5 in, and haven't run into any need for swap. Start with 0, turn on zswap, and if you don't have enough RAM then make a swap file & set it up as backing for zswap.

▲

man8alexd 11 hours ago | parent | prev [-]

You don't need "horough experimentation or careful analysis". Just keep free swap space below few hundred megabytes but above zero.

	▲	boomlinde 10 hours ago \| parent [-]
		"Keep swap space below few hundred megabytes but above zero" is a good example of a rule of thumb. "Make the swap large enough to keep all inactive anonymous pages after the workload has stabilized, but not too large to cause swap thrashing and a delayed OOM kill if a fast memory leak happens" is not.

▲ NoGravitas 6 hours ago | parent | prev | next [-]

Even if you have plenty of memory for your work load, there are useful performance reasons for having some swap. The TLDR section of this link covers the important bits.

https://chrisdown.name/2018/01/02/in-defence-of-swap.html

Relatively speaking, you do need a lot less swap than you did back in the day, but performance will suffer if you don't have some, and having too much doesn't cost you anything except storage.

▲ sl-1 13 hours ago | parent | prev | next [-]

It is still useful for many workloads, I use it in work and on my own machines

▲ krautsauer 11 hours ago | parent | prev [-]

I rely on it heavily. Have you tried zram swap?