This seems like a pretty complex setup with lots of features which aren't obviously important for a deep learning workload.

Presumably the key necessary features are PB's worth of storage, read/write parallelism (can be achieved by splitting a 1PB file into say 10,000 100GB shards, and then having each client only read the necessary shards), and redundancy

Consistency is hard to achieve and seems to have no use here - your programmers can manage to make sure different processes are writing to different filenames.

▲

threeseed 3 months ago | parent | next [-]

> Consistency is hard to achieve and seems to have no use here

Famous last words.

It is very common when operating data platforms like this at this scale to lose a lot of nodes over time especially in the cloud. So having a robust consistency/replication mechanism is vital to making sure your training job doesn't need to be restarted just because the block it needs isn't on the particular node.

	▲	ted_dunning 3 months ago \| parent \| next [-]
		Sadly, these are often Famous First words. What follows is a long period of saying "see, distributed systems are easy for genius developers like me" The last words are typically "oh shit", shortly followed oxymoronically by "bye! gotta go"
	▲	londons_explore 3 months ago \| parent \| prev [-]
		indeed redundancy is fairly important (although the largest part, the training data, actually doesn't matter if chunks are missing). But the type of consistency they were talking about is strong ordering - the type of thing you might want on a database with lots of people reading and writing tiny bits of data, potentially the same bits of data, and you need to make sure a users writes are rejected if impossible to fulfil, and reads never return an impossible intermediate state. That isn't needed for machine learning.

▲

sungam 3 months ago | parent | prev [-]

I wonder whether it may have been originally developed for the quantitive hedge fund

	▲	huntaub 3 months ago \| parent \| next [-]
		Yes, I think this is probably true. I've worked with a lot of different hedge funds who have a similar problem -- lots of shared data that they need in a file system so that they can do backtesting of strategies with things like kdb+. Generally, these folks are using NFS which is kind of a pain -- especially for scaleability -- so building your own for that specific use case (which happens to have a similar usage pattern for AI training) makes a lot of sense.
	▲	ammo1662 3 months ago \| parent \| prev [-]
		Yes, as I mentioned in other comments. The 3FS was designed in 2019. You can check [0] (Chinese) [0] https://www.high-flyer.cn/blog/3fs/