Remix.run Logo
ozgrakkurt 8 hours ago

You need databases if you need any kind of atomicity. Doing atomic writes is extremely fragile if you are just on top of the filesystem.

This is also why many databases have persistence issues and can easily corrupt on-disk data on crash. Rocksdb on windows is a very simple example a couple years back. It was regularly having corruption issues when doing development with it.

dkarl 7 hours ago | parent | next [-]

Honestly, at this point, if I had a design that required making atomic changes to files, I'd redo the design to use SQLite. The other way around sounds crazy to me.

"Why use spray paint when you can achieve the same effect by ejecting paint from your mouth in a uniform high-velocity mist?" If you happen to have developed that particular weird skill, by all means use it, but if you haven't, don't start now.

That probably sounds soft and lazy. I should learn to use my operating system's filesystem APIs safely. It would make me a better person. But honestly, I think that's a very niche skill these days, and you should consider if you really need it now and if you'll ever benefit from it in the future.

Also, even if you do it right, the people who inherit your code probably won't develop the same skills. They'll tell their boss it's impossibly dangerous to make any changes, and they'll replace it with a database.

ricardobeat 3 hours ago | parent | next [-]

> They'll tell their boss it's impossibly dangerous to make any changes, and they'll replace it with a database.

This, 100%. Development today is driven by appearances though, you can take advantage of that. Give it a cute name, make sure you have AI generate an emoji-rich README for it, publish it as an open source npm package, then trigger CI a few thousand times to get a pretty download count. They will happily continue using it without fear!

pythonaut_16 3 hours ago | parent | next [-]

Heuristically they'd be right to say that though.

If you start a new job and on your first day they go "Yeah the last guy said we don't need a database, so he rolled his own." are you gonna be excited, or sweating?

Exception being perhaps "The last team chose to build their own data layer, and here's the decision log and architecture docs proving why it was needed."

hunterpayne 2 hours ago | parent | prev [-]

Serious question, why are people here acting as if formatted files are somehow more reliable than a DB? That just simply isn't true. For most of software development's history, using flat files for persistence of data was the wrong thing to do with good reason. Flat files can easily be corrupted, and that happens much more often than a DB gets corrupted. The reason you might think otherwise is just sampling bias.

btilly 2 hours ago | parent [-]

I do believe that you are missing a healthy dose of sarcasm. Such as faking downloads to give yourself inflated statistics so that your employer will trust untested and AI-written garbage.

That said, there really are good use cases for readable formatted files. For example configuration files that are checked into source control are far more trackable than a SQLite database for the same purpose. For another example, the files are convenient for a lot of data transfer purposes.

But for updateable data? You need a really good reason not to simply use a database. I've encountered such edge cases. But I've encountered a lot more people who thought that they had an edge case, than really did.

duped 5 hours ago | parent | prev [-]

The problem is that most of the time when you want "atomic changes to files" the only safe API is copy the file, mutate it, then rename. That doesn't factor in concurrent writers or advisory locks.

If that kind of filesystem traffic is unsuitable for your application then you will reinvent journaling or write-ahead logging. And if you want those to be fast you'll implement checkpointing and indexes.

noselasd 5 hours ago | parent | prev | next [-]

Yes, the code in the article will at one unlucky point end up with an empty file after a power outage.

At least write to a temp file(in the same filesystem), fsync the file and its folder and rename it over the original.

creatonez 7 hours ago | parent | prev | next [-]

For the simple case, it isn't necessarily that fragile. Write the entire database to a temp file, then after flushing, move the temp file to overwrite the old file. All Unix filesystems will ensure the move operation is atomic. Lots of "we dump a bunch of JSON to the disk" use cases could be much more stable if they just did this.

Doesn't scale at all, though - all of the data that needs to be self-consistent needs to be part of the same file, so unnecessary writes go through the roof if you're only doing small updates on a giant file. Still gotta handle locking if there is risk of a stray process messing it up. And doing this only handles part of ACID.

hunterpayne an hour ago | parent | next [-]

"All Unix filesystems will ensure the move operation is atomic."

This is false, but most fs will. However, there is a lot of fs calls you have to make that you probably don't know about to make the fs operations atomic.

PS The way you propose is probably the hardest way to do an atomic FS operation. It will have the highest probably of failure and have the longest period of operations and service disruption. There is good reason we move rows one at a time or in batches sized to match OS buffers.

jeffffff 6 hours ago | parent | prev [-]

don't forget to fsync the file before the rename! and you also need to fsync the directory after the rename!

goerch 7 hours ago | parent | prev | next [-]

Nice, so we are already covering the A of ACID. And don't get me started about what OLAP databases like DuckDB can do for out of core workloads.

gavinray 4 hours ago | parent | prev | next [-]

  > Doing atomic writes is extremely fragile if you are just on top of the filesystem.
This is not true, at least in Linux.

  pwritev2(fd, iov, iovcnt, offset, RWF_ATOMIC);
The requirements being that the write must be block-aligned and no larger than the underlying FS's guaranteed atomic write size
tclancy 3 hours ago | parent [-]

Sure, but how many people using files as a data store even know to worry about atomicity?

sgarland 3 hours ago | parent [-]

They will learn eventually, and then they’ll get to write a blogpost describing something that any sysadmin or kernel dev could’ve told them. Win-win!

wasabi991011 5 hours ago | parent | prev | next [-]

Yes, this is covered in the "When do you actually need a database?" section of the article.

vector_spaces 6 hours ago | parent | prev [-]

I mean, if your atomic unit is a single file and you can tolerate simple consistency models, flat files are perfectly fine. There are many use cases that fit here comfortably where a whole database would be overkill