you really notice metadata performance (try a git checkout on EFS on AWS. loads of small files takes fucking ages) However EFS is actually pretty fast. you can get decent throughput if you're writing to just one file. but if you're trying to open 1000 1meg files to read from vs 1 1G file, it'll be much slower (unless they'd dramatically improved performance recently)

Trying to have a fast globally consistent database for quadrillion items in the _same_ name space is super hard. You need to chose a tradeoff between speed, partition resistance and consistency.

You're much better off sharding into discreet logical units. Its very rare that you need a global namespace for a filesystem. For VFX where we used lustre a lot, the large namespace was a nice to have, it was more about getting a raid-0 across file servers (well object stores) to get performance.

For filesystems specifically, if you're using folders, then you don't actually need to guarantee much outside of a folder. So long as filenames are unique to that folder, you can get away with a lot of shit you can't do in a normal database. you also don't need directories to be on the same filesystem (well in linux at least) so you can also shard by using directories as a key.

The directory-key-filesystem approach is actually hilariously simple, fast scalable and reliable. If a single server/Fs goes down it only takes out that area. On the downside it does mean that you can overwhelm/get hot spots.

▲

dekhn 3 days ago | parent [-]

We are truly spoiled by all the improvements that went into local filesystems that are lacking in network filesystems. So much of our perception of "computer is fast" is really just write-caching, read-caching, read-ahead.

	▲	KaiserPro 2 days ago \| parent [-]
		Oh nvme and commodity 10/40/100gig networks mean that NFS shares can be _faster_ than local disk In 2008 when I was a youngen, 100tb filesystem that could sustain 1-3gigabytes of streaming throughput took something like 40 racks. Huge amounts of cost and power were needed to set it up and maintain it. Any kind of random IO would kneecap the performance for everyone Now you can have a 2u server with 100tb of NVME storage and the only bottleneck is the network adaptor! not only that but its pretty cheap too.