Remix.run Logo
kvemkon 4 days ago

> 128 KB appears a point of diminishing returns, larger block sizes yield similar or worse performance.

Indeed, 128 KB is a well-known long lasted optimal buffer size [1], [2].

Until it has been increased to 256 KB recently (07.04.2024) [3].

[1] https://github.com/MidnightCommander/mc/commit/e7c01c7781dcd...

[2] https://github.com/MidnightCommander/mc/issues/2193

[3] https://github.com/MidnightCommander/mc/commit/933b111a5dc7d...

jandrewrogers 4 days ago | parent | next [-]

This doesn't generalize.

In 2014, the common heuristic was 256kB based on measurements in many systems, so the 128kB value is in line with that. At the time, optimal block sizing wasn't that sensitive to the I/O architecture so many people arrived at the same values.

In 2024, the optimal block size based on measurement largely reflects the quality and design of your I/O architecture. Vast improvements in storage hardware expose limitations of the software design to a much greater extent than a decade ago. As a general observation, the optimal I/O sizing in sophisticated implementations has been trending toward smaller sizes over the last decade, not larger.

The seeming optimality of large block sizes is often a symptom of an I/O scheduling design that can't keep up with the performance of current storage hardware.

marginalia_nu 4 days ago | parent [-]

I think what you're trying to accomplish is a factor here.

If you just want to saturate the bandwidth, to move some coherent blob of data from point A to point B as fast as possible (say you're implementing the `cp` command), then using large buffers is the best and easiest way. Small buffers confer no additional benefit other than driving more complicated designs, forcing io_uring with registered buffers and fds, etc.

If you want to maximize IOPS, then by the fact that we just established that large buffers saturate the bandwidth better, small buffers is the only viable option, but then you need to whittle down the per-read overhead, and end up with io_uring or even more specialized tools.

marginalia_nu 4 days ago | parent | prev [-]

I wonder if a more robust option is to peek in the sysfs queue info on Linux.

It has some nice information about hardware io operation limits, and also an optimal_io_size hint.

https://www.kernel.org/doc/html/v5.3/block/queue-sysfs.html