▲ | jamesblonde 6 days ago | ||||||||||||||||||||||||||||||||||
Architecturally, it is a scale-out metadata filesystem [ref]. Other related distributed file systems are Collosus, Tectonic (Meta), ADLSv2 (Microsoft), HopsFS (Hopsworks), and I think PolarFS (Alibaba). They all use different distributed row-oriented DBs for storing metadata. S3FS uses FoundationDB, Collosus uses BigTable, Tectonic some KV store, ADLSv2 (not sure), HopsFS uses RonDB. What's important here with S3FS is that it supports (1) a fuse client - it just makes life so much easiter - and (2) NVMe storage - so that training pipelines aren't Disk I/O bound (you can't always split files small enough and parallel reading/writing enough to a S3 object store). Disclaimer: i worked on HopsFS. HopsFS adds tiered storage - NVMe for recent data and S3 for archival. [ref]: https://www.hopsworks.ai/post/scalable-metadata-the-new-bree... | |||||||||||||||||||||||||||||||||||
▲ | MertsA 6 days ago | parent | next [-] | ||||||||||||||||||||||||||||||||||
>Tectonic some KV store, Tectonic is built on ZippyDB which is a distributed DB built on RocksDB. >What's important here with S3FS is that it supports (1) a fuse client - it just makes life so much easier Tectonic also has a FUSE client built for GenAI workloads on clusters backed by 100% NVMe storage. https://engineering.fb.com/2024/03/12/data-center-engineerin... Personally what stands out to me for 3FS isn't just that it has a FUSE client, but that they made it more of a hybrid of FUSE client and native IO path. You open the file just like normal but once you have a fd you use their native library to do the actual IO. You still need to adapt whatever AI training code to use 3FS natively if you want to avoid FUSE overhead, but now you use your FUSE client for all the metadata operations that the native client would have needed to implement. https://github.com/deepseek-ai/3FS/blob/ee9a5cee0a85c64f4797... | |||||||||||||||||||||||||||||||||||
| |||||||||||||||||||||||||||||||||||
▲ | nickfixit 6 days ago | parent | prev | next [-] | ||||||||||||||||||||||||||||||||||
I've been using JuiceFS since the start for my AI stacks. Similar and used postgresql for the meta. | |||||||||||||||||||||||||||||||||||
| |||||||||||||||||||||||||||||||||||
▲ | threeseed 6 days ago | parent | prev | next [-] | ||||||||||||||||||||||||||||||||||
Tiered storage and FUSE has existed with Alluxio for years. And NVMe optimisations e.g. NVMeoF in OpenEBS (Mayastor). None of it is particularly ground breaking just a lot of pieces brought together. | |||||||||||||||||||||||||||||||||||
| |||||||||||||||||||||||||||||||||||
▲ | joatmon-snoo 6 days ago | parent | prev | next [-] | ||||||||||||||||||||||||||||||||||
nit: Colossus* for Google. | |||||||||||||||||||||||||||||||||||
▲ | objectivefs 6 days ago | parent | prev [-] | ||||||||||||||||||||||||||||||||||
There is also ObjectiveFS that supports FUSE and uses S3 for both data and metadata storage, so there is no need to run any metadata nodes. Using S3 instead of a separate database also allows scaling both data and metadata with the performance of the S3 object store. | |||||||||||||||||||||||||||||||||||
|