▲ | jandrewrogers 3 days ago | ||||||||||||||||||||||||||||||||||||||||||||||
I have worked on exabyte-scale storage engines. There is a good engineering reason for this type of limitation. If you had 1 KiB average file size then you have quadrillions of metadata objects to quickly search and manage with fine-granularity. The kinds of operations and coordination you need to do with metadata is difficult to achieve reliably when the metadata structure itself is many PB in size. There are interesting edge cases that show up when you have to do deep paging of this metadata off of storage. Making this not slow requires unorthodox and unusual design choices that introduce a lot of complexity. Almost none of the metadata fits in memory, including many parts of conventional architectures we assume will always fit in memory. A mere trillion objects is right around the limit of where the allocators, metadata, etc can be made to scale with heroic efforts before conventional architectures break down and things start to become deeply weird on the software design side. Storage engines need to be reliable, so avoiding that design frontier makes a lot of sense if you can avoid it. It is possible to break this barrier but it introduces myriad interesting design and computer science problems for which there is little literature. | |||||||||||||||||||||||||||||||||||||||||||||||
▲ | toast0 3 days ago | parent | next [-] | ||||||||||||||||||||||||||||||||||||||||||||||
Small files suck on normal filesystems too. There's reasons to have them, but if the stars align and you can go from M directories of N directories of O files to M directories of N files with O sub-files, it can make a lot of operations way faster, but probably not updates to individual sub-files (but if you're updating all the files and can update all of the M/N.db at once, then that probably is faster) | |||||||||||||||||||||||||||||||||||||||||||||||
▲ | stuartjohnson12 3 days ago | parent | prev [-] | ||||||||||||||||||||||||||||||||||||||||||||||
This sounds like a fascinating niche piece of technical expertise I would love to hear more about. What are the biggest challenges in scaling metadata from a trillion to a quadrillion objects? | |||||||||||||||||||||||||||||||||||||||||||||||
|