> Most of the metadata activity is contained within a single shard: > > - File creation, same-directory renames, and deletion. > - Listing directory contents. > - Getting attributes of files or directories.

I guess this is a trade-off between a file system and an object store? As in S3, ListObjects() is a heavy hitter and there can be potentially billions of objects under any prefix. Scanning only on a single instance won't be sufficient.

▲

jeffinhat 3 days ago | parent [-]

It's definitely a different use case but given they haven't had to tap into their follower replicas for scale, it must be pretty efficient and lightweight. I suspect not having ACLs helps. They also cite a minimum 2MB size, so not expecting exabtyes of little bytes.

I wonder if a major difference is listing a prefix in object storage vs performing recursive listings in a file system?

Even in S3, performing very large lists over a prefix is slow and small files will always be slow to work with, so regular compaction and catching file names is usually worthwhile.

	▲	jleahy 3 days ago \| parent [-]
		2MB median to be fair, so half of our files are under 2MB.