As I understand this article, swap is useful for cases where many long-lived programs (daemons) allocate a lot of memory, but almost never access it. But wouldn't it be better to avoid writing such programs? And how many memory such daemons can consume? A couple of hundred megabytes total? Is it really that much on modern systems?

My experience with swap shows, that it only makes things worse. When I program, my application may sometimes allocate a lot of memory due to some silly bug. In such case the whole system practically stops working - even mouse cursor can't move. If I am happy, OOM killer will eventually kill my buggy program, but after that it's not over - almost all used memory is now in swap and the whole system works snail-slow, presumably because kernel doesn't think it should really unswap previously swapped memory and does this only on demand and only page by page.

I a hypothetical case without swap this case isn't so painful. When main system memory is almost fully consumed, OOM killer kills the most memory hungry program and all other programs just continue working as before.

I think that overall reliance on swap is noways just a legacy of old times when main memory was scarce and back than it maybe was useful to have swap. OS kernels should be redesigned to work without swap, this will make system behavior smoother and kernel code may be simpler (all this swapping code may be removed) and thus faster.

▲

Rohansi 4 days ago | parent | next [-]

> As I understand this article, swap is useful for cases where many long-lived programs (daemons) allocate a lot of memory, but almost never access it. But wouldn't it be better to avoid writing such programs?

Ideally yes, but is that something you keep in mind when you write software? Do you ever consider freeing memory just because it hasn't been used in a while? How do you decide when to free it? This is all handled automatically when you have swap enabled, and at a granularity that is much higher than you can practically manually implement it.

▲

Panzerschrek 4 days ago | parent | next [-]

I write mostly C++ or Rust programs. In these languages memory is freed as soon as it's no longer in use (thanks to destructors). So, usually this shouldn't be actively kept in mind. The only exception are cases like caches, but long-running programs should use caching carefully - limit cache size and free cache entries after some amount of time.

Programs, which allocate large amounts of memory without strict necessity to do so, are just a consequence of swap existence. "Thanks" to swap they weren't properly tested in low-memory conditions and thus no necessary optimization were done.

▲

Rohansi 4 days ago | parent | next [-]

You'll also need to consider that the allocator you're using may not immediately free memory to the system. That memory is free to be used by your application but considered as used memory mapped to your program.

Anyway, it's easy to discuss best practices but people actually following them is the actual issue. If you disable swap and the software you're running isn't optimized to minimize idle memory usage then your system will be forced to keep all of that data in RAM.

▲

man8alexd 4 days ago | parent [-]

You are both confusing swap and memory overcommit policy. You can disable swap by compiling the kernel with `CONFIG_SWAP=no`, but it won't change the memory overcommit policy, and programs would still be able to allocate more memory than available on the system. There is no problem in allocating the virtual memory - if it isn't used, it never gets mapped to the physical memory. The problem is when a program tries to use more memory than the system has, and you will get OOMs even with the swap disabled. You can disable memory overcommit, but this is only going to result in malloc() failing early while you still have tons of memory.

▲

Rohansi 3 days ago | parent [-]

Overcommit is different. We are referring to infrequently used memory - allocated, has been written to, but is not accessed often.

	▲	man8alexd 3 days ago \| parent [-]
		> Programs, which allocate large amounts of memory without strict necessity to do so, are just a consequence of swap existence. This. The ability to allocate large amounts of memory is due to memory overcommit, not the "swap existence". If you disable swap, you can still allocate memory with almost no restrictions. > This is all handled automatically when you have swap enabled And this. This statement doesn't make any sense. If you disable swap, kernel memory management doesn't change, you only lose the ability to reclaim anon pages.

▲

csmantle 4 days ago | parent | prev | next [-]

A side note, stack memories are usually not physically returned to the OS. When (de)allocating on stack, only the stack pointer is moved within the pages preallocated by the OS.

▲

jibal 4 days ago | parent | prev | next [-]

> Programs, which allocate large amounts of memory without strict necessity to do so, are just a consequence of swap existence. "Thanks" to swap they weren't properly tested in low-memory conditions and thus no necessary optimization were done.

Who told you this? It's not remotely true.

Here's an article about this subject that you might want to read:

https://chrisdown.name/2018/01/02/in-defence-of-swap.html

▲

rwmj 4 days ago | parent | prev [-]

> In these languages memory is freed as soon as it's no longer in use (thanks to destructors).

Unless you have an almost pathological attention to detail, that is not true at all. And even if you do precisely scope your destructors, the underlying allocator won't return the memory to the OS (what matters here) immediately.

▲

immibis 4 days ago | parent | prev [-]

And were you aware that freeing memory only allows it to be reallocated within your process but doesn't actually release it from your process? State-of-the-art general-purpose allocators are actually still kind of shit.

▲

zozbot234 4 days ago | parent | prev | next [-]

> I a hypothetical case without swap this case isn't so painful. When main system memory is almost fully consumed, OOM killer kills the most memory hungry program

That's not how it works in practice. What happens is that program pages (and read-only data pages) get gradually evicted from memory and the system still slows to a crawl (to the point where it becomes practically unresponsive) because every access to program text outside the current 4KB page now potentially involves a swap-in. Sure, eventually, the memory-hungry task will either complete successfully or the OOM killer will be called, but that doesn't help you if you care about responsiveness first and foremost (and in practice, desktop users do care about that - especially when they're trying to terminate that memory hog).

▲

Panzerschrek 3 days ago | parent [-]

Why not just always preserving program code in memory? It's usually not that much - typical executable is usually several megabytes in size and many processes can share the same code memory pages (especially with shared libraries).

▲

creshal 3 days ago | parent | next [-]

> It's usually not that much - typical executable is usually several megabytes in size and many processes can share the same code memory pages (especially with shared libraries)

Have a look at Chrome. Then have a look at all the Electron "desktop" apps, which all ship with a different Chrome version and different versions of shared libraries, which all can't share memory pages, because they're subtly different. You find similar patterns across many, many other workloads.

	▲	teddyh 3 days ago \| parent [-]
		Or modern languages, like Rust and Go, which have decided that runtime dependencies are too hard and instead build enormous static binaries for everything.

▲

inkyoto 3 days ago | parent | prev | next [-]

> Why not just always preserving program code in memory?

Because the code is never required in its entirety – only «currently» active code paths need to be resident in memory, the rest can be discarded when inactive (or never even gets loaded into memory to start off with) and paged back into memory on demand. Since code pages are read only, the inactive code pages can be just dropped without any detriment to the application whilst reducing the app's memory footprint.

> […] typical executable is usually several megabytes

Executable size != the size of the actually running code.

In modern operating systems with advanced virtual memory management systems, the actual resident code size can go as low as several kilobytes (or, rather, a handful of pages). This, of course, depends on whether the hot paths in the code have a close affinity to each other in the linked executable.

▲

man8alexd 3 days ago | parent | prev [-]

Programs and shared libraries (pages with VM_EXEC attribute) are kept in the memory if they are actively used (have the "accessed" bit set by the CPU) and are least likely to be evicted.

▲

MomsAVoxell 4 days ago | parent | prev | next [-]

> But wouldn't it be better to avoid writing such programs?

Think long-term recording applications, such as audio or studio situations where you want to "fire and forget" reliable recording systems of large amounts of data consistently from multiple streams for extended durations, for example.

▲

dns_snek 4 days ago | parent [-]

Why wouldn't you write that data to disk? Holding it all in RAM isn't exactly a reliable way of storing data.

▲

MomsAVoxell 3 days ago | parent [-]

What do you think is happening with swap, exactly?

▲

robotresearcher 3 days ago | parent | next [-]

A process’s memory in swap does not persist after the process quits or crashes.

▲

MomsAVoxell 3 days ago | parent [-]

That is true, but the point is that having swap available, increases the time between recording samples, and needing to commit them to disk.

Well-written, long term recording software doesn’t quit or crash. It records what it needs to record, and - by using swap - gives itself plenty of time to flush the buffers using whatever techniques are necessary for safety.

Disclaimer: I’ve written this software, both with and without swap available in various embedded contexts, in real products. The answer to the question is that having swap means higher data rates can be attained before needing to sync.

▲

robotresearcher 3 days ago | parent [-]

> Well-written, long term recording software doesn’t quit or crash.

Power outages, hardware failures, and OS bugs happen to the finest application software.

I believe you from your experience that it can be useful to have recorded buffers swap out before flushing them to durable storage. But I do find it a bit surprising, since the swap system has to do the storage flush you are paying for the IO, why not do it durably?

The fine article argued that you can save engineer cycles by having the OS manage optimizing out-of-working set memory for you, but that isn’t what you’re arguing here.

I’m interested in understanding your point.

▲

MomsAVoxell 2 days ago | parent [-]

I guess the point is, sometimes you just need a lot of memory and want to record into it as quickly as you can.

Then, when the time is right, flush it all to disk.

The VMM is pretty good at being tight and productive - so use it as the big fat buffer it is, and spawn worker threads to flush things at appropriate times.

If you don't have swap, you have to flush more often ...

	▲	2 days ago \| parent [-]
		[deleted]

▲

dns_snek 3 days ago | parent | prev [-]

That's weirdly passive aggressive, swap isn't durable data storage.

Reliably recording massive amounts of data for extended periods of time in a studio setting is the most obvious use case for a fixed-size buffer that gets flushed to durable storage at short and predictable time intervals. You wouldn't want a segfault wiping out the entire day's worth of work, would you?

▲

MomsAVoxell 3 days ago | parent [-]

I didn’t mean to imply that swap was durable data storage.

Having swap/more memory available just means you have more buffers before needing to commit and in certain circumstances this can be very beneficial, such as when processing of larger amounts of logged data is needed prior to committing, etc.

There is certainly a case for both having and using swap, and disabling it entirely, depending on the data load and realtime needs of the application. Processing data and saving data have different requirements, and the point is really that there is no black and white on this. Use swap if it’s appropriate to the application - don’t use it, if it isn’t.

▲

dns_snek 3 days ago | parent [-]

I don't really understand what problem you're solving by doing it that way.

Instead of storing data (let's call them samples) to durable storage to begin with, you're letting the OS write them to swap which incurs the same cost, but then you need to read them from swap and write them to a different partition again (~triple the original cost).

	▲	MomsAVoxell 2 days ago \| parent [-]
		Actually the VMM is pretty performant, all things considered. Having more memory, managed for the process by the VMM, means less fuss doing a flush than if you were to memory-constrain things out of the gate. Yes, sometimes, it's perfectly acceptable to flush to disk because you're getting low on RAM. But, on systems with, say .. 4x more swap than physical RAM .. you don't have to do a flush that often - if at all. This is great, for example with high quality audio loads that have to be captured safely over long periods. A system with low RAM and high swap is also a bit more economical, at scale, when building actual hardware in large numbers. So exploiting swap in that circumstance can also effect the BOM costs.

▲

toast0 3 days ago | parent | prev | next [-]

You may benefit by reducing your swap size significantly.

The old rule of thumb of 1-2x your ram is way too much for most systems. The solution isn't to turn it off, but to have a sensible limit. Try with half a gig of swap and see how that does. It may give you time to notice the system is degraded and pick something to kill yourself and maybe even debug the memory issue if needed. You're not likely to have lasting performance issues from too many things swapped out after you or the OOM killer end the memory pressure, because not much of your memory will fit in swap.

▲

jcynix 3 days ago | parent | prev | next [-]

> When I program, my application may sometimes allocate a lot of memory due to some silly bug. In such case the whole system practically stops working [...]

You can limit resource usage per process thus your buggy application could be killed long before the system comes to a crawl. See your shell' s entry on its limit/ulimit built-in or use

man prlimit(1) - get and set process resource limits

▲

blueflow 4 days ago | parent | prev | next [-]

Programs run from program text, program text is mapped in as named pages (disk cache). They are evictable! And without swap, they will get evicted on high memory pressure. Program text thrashing is worse than having swap.

The problem is not the existence of swap, but that people are unaware that the disk cache is equally important for performance.

▲

man8alexd 3 days ago | parent | next [-]

VM_EXEC pages are explicitly deprioritized from the reclaim by the kernel. Unlike any other pages, they are put into the active LRU on the first use and remain in the active LRU if they are active.

▲

blueflow 3 days ago | parent [-]

... until there are no deprioritized pages left to evict.

	▲	man8alexd 3 days ago \| parent [-]
		Pages from the active LRU are not evicted. Pages from the inactive LRU with the "accessed" bit set are also not evicted.

▲

Panzerschrek 3 days ago | parent | prev [-]

It's yet another old crap - to load program code from disk on-demand. Nowadays it's just easier to load the whole executable into memory and always preserve it.

▲

kalleboo 3 days ago | parent | prev | next [-]

> When I program, my application may sometimes allocate a lot of memory due to some silly bug

I had one of those cases a few years ago when a program I was working on was leaking 12 MP raw image buffers in a drawing routine. I set it off running and went browsing HN/chatting with friends. A few minutes later I was like "this process is definitely taking too long" and when I went to check on it, it was using up 200+ GB of RAM (on a 16 GB machine) which had all gone to swap.

I hadn't noticed a thing! Modern SSDs are truly a marvel... (this was also on macOS rather than Linux, which may have a better swap implementation for desktop purposes)

▲

inkyoto 3 days ago | parent | prev | next [-]

Swapping (or, rather, paging – I don't think there is an operating system in existence today that swaps out entire processes) does not make modern systems slower – it is a delusion and an urban legend that originated in the sewers of the intertubes and is based on an uninformed opinion rather than the understanding and knowledge of how virtual memory systems work. It has been regurgitated to death, and the article explains it really well why it is a delusion.

20-30 years ago, heavy paging often crippled consumer Intel based PC's[0] because paging went to slow mechanical hard disks on PATA/IDE, a parallel device bus (until 2005 circa), which had little parallelism and initially no native command queuing; SCSI drives did offer features such as tagged command queuing and efficient scatter-gather but were uncommon on desktops leave alone laptops. Today the bottlenecks are largely gone – abundant RAM, switched interconnects such as PCIe, SATA with NCQ/AHCI, and solid-state storage, especially NVMe, provide low-latency, highly parallel I/O – so paging still signals memory pressure yet is far less punishing on modern laptops and desktops.

Swap space today has a quieter benefit: lower energy use. On systems with LPDDR4/LPDDR5, the memory controller can place inactive banks into low-power or deep power-down states; by compressing memory and paging out cold, dirty pages to swap, the OS reduces the number of banks that must stay active, cutting DRAM refresh and background power. macOS on Apple Silicon is notably aggressive with memory compression and swap and works closely with the SoC power manager, which can contribute to the strong battery life of Apple laptops compared with competitors, albeit this is only one factor amongst several.

[0] RISC workstations and servers have had switched interconnects since day 1.

	▲	man8alexd 3 days ago \| parent [-]
		[dead]

▲

Ferret7446 3 days ago | parent | prev | next [-]

Kinda, basically. Swap is a cost optimization for "bad" programs.

Having more RAM is always better performance, but swap allows you to skimp out on RAM in certain cases for almost identical performance but lower cost (of buying more RAM), if you run programs that allocate a lot of memory that it subsequently doesn't use. I hear Java is notoriously bad at this, so if you run a lot of heavy enterprise Java software, swap can get you the same performance with half the RAM.

(It is also a "GC strategy", or stopgap for memory leaks. Rather than managing memory, you "could" just never free memory, and allocate a fat blob of swap and let the kernel swap it out.)

▲

creshal 3 days ago | parent | prev [-]

> But wouldn't it be better to avoid writing such programs?

Yes, indeed, the world would be a better place if we had just stopped writing Java 20 years ago.

> And how many memory such daemons can consume? A couple of hundred megabytes total?

Consider the average Java or .net enterprise programmer, who spends his entire career gluing together third-party dependencies without ever understanding what he's doing: Your executable is a couple hundred megabytes already, then you recursively initialize all the AbstractFactorySingletonFactorySingletonFactories with all their dependencies monkey patched with something worse for compliance reasons, and soon your program spends 90 seconds simply booting up and sits at two or three dozen gigabytes of memory consumption before it has served its first request.

> Is it really that much on modern systems?

If each of your Java/.net business app VMs needs 50 or so gigabytes to run smoothly, you can only squeeze ten of them in an 1U pizza box with a mere half terabyte RAM; while modern servers allow you to cram in multiple terabytes, do you really want to spend several tens of thousands of dollars on extra RAM, when swap storage is basically free?

Cloud providers do the same math, and if you look at e.g. AWS, swap on EBS costs as much per month as the same amount of RAM costs per hour. That's almost three orders of magnitude cheaper.

> When I program, my application may sometimes allocate a lot of memory due to some silly bug.

Yeah, that's on you. Many, many mechanism let you limit the per-process memory consumption.

But as TFA tries to explain, dealing with this situation is not the purpose of swap, and never has been. This is a pathological edge case.

> almost all used memory is now in swap and the whole system works snail-slow, presumably because kernel doesn't think it should really unswap previously swapped memory and does this only on demand and only page by page.

This requires multiple conditions to be met

- the broken program is allocating a lot of RAM, but not quickly enough to trigger the OOM killer before everything has been swapped out

- you have a lot of swap (do you follow the 1990s recommendation of having 1-2x the RAM amount as swap?)

- the broken program sits in the same cgroup as all the programs you want to keep working even in an OOM situation

Condition 1 can't really be controlled, since it's a bug anyway.

Condition 2 doesn't have to be met unless you explicitly want it to. Why do you?

Condition 3 is realistically on desktop environments, despite years of messing around with flatpaks and snaps and all that nonsense they're not making it easy for users to isolate programs they run that haven't been pre-containerized.

But simply reducing swap to a more realistic size (try 4GB, see how far it gets you) will make this problem much less dramatic, as only parts of the RAM have to get flushed back.

> I a hypothetical case without swap this case isn't so painful. When main system memory is almost fully consumed, OOM killer kills the most memory hungry program and all other programs just continue working as before.

And now you're wasting RAM that could be used for caching file I/O. Have you benchmarked how much time you're wasting through that?

> I think that overall reliance on swap is noways just a legacy of old times when main memory was scarce and back than it maybe was useful to have swap.

No, you just still don't understand the purpose of swap.

Also, "old times"? You mean today? Because we still have embedded environments, we have containers, we have VMs, almost all software not running on a desktop is running in strict memory constraints.

> and kernel code may be simpler (all this swapping code may be removed)

So you want to remove all code for file caching? Bold strategy.