Remix.run Logo
anorwell 3 days ago

Seems like a really interesting project! I don't understand what's going on with latency vs durability here. The benchmarks [1] report ~1ms latency for sequential writes, but that's just not possible with S3. So presumably writes are not being confirmed to storage before confirming the write to the client.

What is the durability model? The docs don't talk about intermediate storage. Slatedb does confirm writes to S3 by default, but I assume that's not happening?

[1] https://www.zerofs.net/zerofs-vs-juicefs

Shakahs 3 days ago | parent | next [-]

SlateDB offers different durability levels for writes. By default writes are buffered locally and flushed to S3 when the buffer is full or the client invokes flush().

https://slatedb.io/docs/design/writes/

Eikon 3 days ago | parent | prev [-]

The durability profile before sync should be pretty close to a local filesystem. There’s (in-memory) buffering happening on writes, then when fsync is issued or when we exceed the in-memory threshold or we exceed a timeout, data is sync-ed.

anorwell 2 days ago | parent [-]

Thanks, makes sense. I found the benchmark src to see it's not fsyncing, so only some of the files will be durable by the time the benchmark is done. The benchmark docs might benefit from discussing this or benchmarking both cases? O_SYNC / fsync before file close is an important use case.

edit: A quirk with the use of NFSv3 here is that there's no specific close op. So, if I understand right, ZeroFS' "close-to-open consistency" doesn't imply durability on close (and can't unless every NFS op is durable before returning), only on fsync. Whereas EFS and (I think?) azure files do have this property.

Eikon 2 days ago | parent [-]

There's an NFSv3 COMMIT operation, combined with a "durability" marker on writes. fsync could translate to COMMIT, but if writes are marked as "durable", COMMIT is not called by common clients, and if writes are marked as non-durable, COMMIT is called after every operation, which kind of defeats the point. When you use NFS with ZeroFS, you cannot really rely on "fsync".

I'd recommend using 9P when that matters, which has proper semantics there. One property of ZeroFS is that any file you fsync actually syncs everything else too.