Remix.run Logo
sethev 3 days ago

This seems sketchy. O_DIRECT skips the operating system's page cache, it does not guarantee that the SSD driver sent the data to the SSD or issued a flush to the drive itself. The data could still be in the driver's memory or the in non-durable memory in the drive itself when this engine says "ok, we're good".

EDIT: sketchy from an answering "what exactly are the guarantees?" perspective

jandrewrogers 3 days ago | parent [-]

The model here is that the storage device is directly reading and writing the userspace buffer via DMA. It is one of the reasons use of O_DIRECT creates additional constraints on buffer alignment and size.

Some storage devices guarantee durability of non-persisted writes, which is explicitly part of their model. Consequently, the entire durable write path is the storage device completing a DMA read of their buffer.

The underlying assumptions will not hold true for every environment. However, it will hold true for many and you can check most (all?) of them at runtime.

sethev 3 days ago | parent [-]

Right - I mean, what you're describing makes sense, but it doesn't sound like what they're describing. Their benchmarks are running on an EC2 instance and the post's author is here saying that they run on virtualized hardware. Plus they run on top of a file system. None of that screams "direct DMA from our buffers" to me.

I'm not saying it's impossible, but typically people who want to lean on hardware guarantees for extra performance control more of the stack.