PREEMPT_LAZY triggering on page faults seems like a bad idea in light of this. It is probably not a good idea to suspend processes right when they get unexpectedly bogged down. The logic makes a little more sense for syscalls that are expected to take long compared to a scheduling quantum (a few milliseconds). But page faults are mostly invisible and unplannable.

It only took a few decades for Linux to get a good CPU scheduler and good I/O schedulers, too. I don't get how such an important area can be so bad for so long. But then, bad scheduling is everywhere. I find it to be a pretty fun area to work in, but, judging by how much it is less than half-assed in much existing software, most developers seem to hate dealing with it?

▲

AlienRobot 3 hours ago | parent | next [-]

One thing I miss from using Windows is that the desktop didn't just freeze completely if you ran out of RAM.

At first I thought that maybe Linux doesn't have ways to give priority to the desktop environment (a.k.a. "graphical shell") which is why running out of RAM means your cursor starts lagging, clicking on things stops working, etc.

But maybe Linux is just bad at that in general and a single process eating too much RAM can simply bring the whole system to a halt as it tries to move and compress RAM to a pagefile on an HDD (not SSD).

Every time it happens to me I just find it so incredible. Here I am with a PC with a multiple cores, multiple processors, and a single process eating all the RAM can bottleneck ALL of them at once? Am I misunderstanding something? Shouldn't it, ideally, work in such way that so long as one processor is free, the system can process mouse input and render the cursor and do all the desktop stuff no matter what I/O is happening in the background?

Since it's Linux maybe it's just my DE/distro (Cinnamon/Mint). Maybe it does allocations under the assumption there will always be a few free bytes in RAM available, so it halts if RAM runs out while some other DE wouldn't. But even then you'd think there would be a way to just reserve "premium" memory for critical processes so they never become unresponsive.

I wonder if other people have the same experience as me. This part of Linux just always felt fundamentally poor for me.

▲

rcxdude 3 hours ago | parent | next [-]

This issue is much worse if you don't have swap. What happens, I think, is that as memory allocated by processes grows to fill the available RAM, it starts to push out memory that doesn't technically need to be in RAM, like cached file pages. Which accounts for some of the slowdown, until it reaches the code itself, which is 'just' a memory mapped file. So eventually most of the code that is actively trying to run is being pushed out of RAM and must be loaded in as it executes, slowing everything to a crawl and generally creating a death spiral. If you have swap the kernel can decide to put other pages onto disk and keep the more important stuff in memory. Or you can run something like early-oom which stops things from getting to that point in the first place (albeit in a somewhat brute-force manner).

Dealing with low-memory situations elegantly is pretty hard: firstly Linux uses memory overcommit by default, in part because the semantics of fork imply very large memory commitments which are almost never realised, and in part because a lot of software does the same because it's the default. Secondly, managing allocation failures is often tricky and ill-tests, and often requires co-ordination between different systems. The DE could, though, in principle, put running applications in a container which would prevent them from using above a certain amount of memory, but the results are similar to early-oom in that the result of reaching the limit is almost certainly the termination of the process using the most memory.

	▲	AlienRobot 2 hours ago \| parent [-]
		Yes, but the problem, I feel, is the priority of what gets pushed out to RAM on Linux. You could split the processes into 2 categories: 1: applications that are doing tasks the user wants. 2: OS processes that the user needs to interact with in order to terminate applications. There is an argument for applications taking priority: the user wants to do a task, if you move application out of RAM, the task is going to take longer. But to me OS processes, including the graphical shell (taskbar, windowing system, etc.), should have priority: if an application hangs on I/O, the user NEEDS to be able to use the taskbar in order to terminate the application, otherwise they're going to have to wait who knows how long for the application to finish its task (or just hard reset the computer). I don't know anything about how Linux handles memory, but the impression I have is that it has its priorities wrong, or it may not even have a way to configure priorities (unlikely), or maybe there is a to prioritize what is kept in memory but it only splits kernel/userspace memory so DE's that sit in userspace don't get priority (i.e. it's inadequate for a graphical operating system). To be frank, as a desktop Linux user my biggest fear is that the Linux kernel is perfectly capable of prioritizing kernel/userspace memory, but it has no way to prioritize DE's. In other words, that the "graphical OS" use case of Linux is a second-class citizen, a feature bolted on top of GNU/Linux/Systemd. Because that would mean a lot of things are considered only from the perspective of a Linux server. This is only my imagination talking, since I'm not really involved with how Linux works. But to be fair I was never involved with how Windows worked either, and I never doubted it considered desktop a primary use case.

▲

jcgl 3 hours ago | parent | prev | next [-]

Same experience here. Linux admin. I’d absolutely love to be told I’m holding it wrong, but all I can see is that there’s no way to hold it right.

Your consternation is seconded.

	▲	baq 2 hours ago \| parent \| next [-]
		It’s even worse than that… you can hard lock a system with significant freeable memory left if you have insane vm.dirty_* settings (which is of course the case by default)
	▲	rcxdude 3 hours ago \| parent \| prev [-]
		The two mitigations are to: - (somewhat counterintuitively) have swap enabled - run something like earlyoom to stop the system from reaching a low-RAM situation in the first place.

▲

nijave an hour ago | parent | prev [-]

More aggressive oomkiller and cgroups have helped in recent years

Edit: systemd-oomd is what I was thinking of

▲

bobmcnamara 4 hours ago | parent | prev [-]

Userspace spinlocks seem like a risky idea too.

What if it was on a VM and the core holding the lock got descheduled from the hypervisor?