Remix.run Logo
jeffbee 3 days ago

You could leave this problem behind by switching to a filesystem that isn't full of deadlock bugs.

kiririn 3 days ago | parent | next [-]

A background thread performing blocking io is an implementation detail not a bug. Other filesystems don’t have/need that sort of bookkeeping, so if a block device stalls badly enough to trigger these warnings then it will be attributed to application threads (if at all) rather than btrfs worker threads, but regardless the stall very much still happens

nubinetwork 3 days ago | parent [-]

> if a block device stalls badly

That's really the issue at heart, because I've seen these on zfs as well... but you'd think the filesystem would report some progress to keep bumping the timer so it doesn't start spamming dmesg. /shrug

yjftsjthsd-h 3 days ago | parent | prev | next [-]

I am curious - is this message indicative of a problem in the fs? I would have assumed anything marked "INFO" is, tautologically, not an error, but surely a filesystem shouldn't be locking up? Or is it just suggestive of high system load or poor hardware performance?

o11c 3 days ago | parent | next [-]

In my experience, "hung task" is almost always due to running out of RAM and the scheduler constantly thrashing instead of doing useful work. I rarely actually reach the point of seeing the message since I'll sysrq-kill if early enough, or else hard-reboot.

Note also that modern filesystems do a lot of background work that doesn't strictly need to be done immediately for correctness.

(of course, it also seems common for people to completely disregard the well-documented "this feature is unreliable, don't use it" warnings that btrfs has, then complain that they have problems and not mention that they ignored the warnings until everyone is halfway through complaining)

The only problems I've encountered in all my years of using btrfs are:

* when (all copies of) a file bitrots on disk, you can't read it at all, rather than being able to copy the mostly-correct file and see if you can hand-correct it into something usable

* if you enable new compression algorithms on your btrfs volume, you can't read your data from old kernels (often on liveusb recovery disks)

* fsync is slow. Like, really really slow. And package managers designed for shitty CoW-less filesystems use fsync a lot.

jeffbee 3 days ago | parent | next [-]

Hung tasks due to low memory are a bug not a feature. Any time you put the Linux kernel under memory pressure you trigger its wealth of defects in error handling paths, none of which are tested and most of which are rarely exercised in practice. For example squashfs used to have a resource leak under memory pressure where it would exit a function without releasing a lock, after which all block operations system-wide would hang forever until reboot. Linux is absolutely crawling with that type of defect, but not uniformly. Some subsystems have more than others, and btrfs is unusually dense with them.

bhaney 3 days ago | parent | prev [-]

> In my experience, "hung task" is almost always due to running out of RAM

In my case, I don't think this machine ever commits more than around 5GB of its 32GB available memory, so I doubt it's that.

> it also seems common for people to completely disregard the well-documented "this feature is unreliable, don't use it" warnings that btrfs has

Now that I am definitely doing. I won't give up raid6 until it eats all my data for a fourth time.

blueflow 3 days ago | parent | prev | next [-]

That the in-kernel code for btrfs locks up should never happen at all. There is a rumor going around that btrfs never reached maturity and suffers from design issues.

SoftTalker 3 days ago | parent | next [-]

That's why I use ext4 exclusively on linux. Never once had a filesystem issue.

shiroiushi 3 days ago | parent [-]

ext4 works fine on my Linux laptop and I agree, it's proven itself over many years to be supremely reliable, though it doesn't compare in features to the more complex filesystems.

On my home media server, however, I'm using ZFS in a RAID array, with regular scrubs and snapshots. ZFS has many features like RAID, scrubs, COW, snapshots, etc. that you just don't get on ext4. However, unlike btrfs, ZFS seems to have a great reputation for reliability with all its features.

kelnos 3 days ago | parent | next [-]

I use ext4 on my home media server (24TB). I'm using LVM and MD, and it's been rock solid for a couple decades now, surviving all sorts of hardware failures.

I haven't missed out on any zfs or btrfs features. Yes, I know about their benefits, and no, I don't care if a few bits flip here or there over time.

SoftTalker 3 days ago | parent | prev [-]

Granted it was at least a decade ago but the team I was on had a terrible experience with ZFS and that bad taste still lingers. And I don’t need any of its features.

yjftsjthsd-h 3 days ago | parent [-]

Could I ask you to expand on your problems with ZFS? Code bugs, data loss, operational problems, ...? (Asking because I use it and would like to learn from your problems rather than having to experience the pain myself.)

SoftTalker a day ago | parent [-]

It was a long time ago, and I wasn't directly involved. I did experience the fallout though, which was hours if not days of downtime to repair corruption. I'm sure it's better now, and these problems might not even be possible on a small home system, but I've always avoided ZFS since then. Plus it's just too complicated for my needs. As time has gone by I no longer enjoy complex technology for its own sake (I very much did when I was younger). Now I just want it to work and not demand any more of my time and brainpower than necessary. Features that I don't need are negatives.

ramon156 3 days ago | parent | prev [-]

Given the mailing History with Linus I wouldn't be surprised

shric 3 days ago | parent | prev [-]

It could be any of the above, I'd say it's info because the kernel itself is not in an error state, it's information about a process doing something unusual

bhaney 3 days ago | parent | prev [-]

I was planning on it but the filesystem I wanted to switch to keeps getting set back by the author's CoC drama

przemub 3 days ago | parent [-]

What did you want to switch to?

I suppose the author at least isn't a murderer :)

taskforcegemini 3 days ago | parent [-]

the drama part was most likely refering to bcachefs

pdimitar 2 days ago | parent [-]

Oh? What happened there?