Remix.run Logo
Veserv 7 months ago

You will generally only stall indefinitely if you are waiting for new data. So, you will actually handle almost every use case if your blocking read/wait also respects the timeout and does the flush on your behalf. Basically, do it synchronously at the top of your event loop and you will handle almost every case.

You could also relax the guarantee and set a timeout that is only checked during your next write. This still allows unbounded latency, but as long as you do one more write it will flush.

If neither of these work, then your program issues a write and then gets into a unbounded or unreasonably long loop/computation. At which point you can manually flush what is likely the last write your program is every going to make which would be a trivial overhead since that is a single write compared to a ridiculously long computation. That or you probably have bigger problems.

toast0 7 months ago | parent [-]

Yeah, these are all fine to do, but a libc can really only do the middle one. And then, at some cost.

If you're already using an event loop library, I think it's reasonable for that to manage flushing outputs while waiting for reads, but I don't think any of the utilities in this example do; maybe tcpdump does, but I don't know why grep would.

Veserv 7 months ago | parent [-]

Sure, but the article is talking about grep, not write() or libc implementations.

grep buffers writes with no flush timeout resulting in the problem in the article.

grep should probably not suffer from the problem and can use a write primitive/library/design that avoids such problems with relatively minimal extra complexity and dependencies while retaining the performance advantages of userspace buffering.

Most programs (that are minimizing dependencies so can not pull in a large framework, like grep or other simple utilities) would benefit from using such modestly more complex primitives instead of bare buffered writes/reads. Such primitives are relatively easy to use and understand, being largely a drop-in replacement in most common use cases, and resolve most remaining problems with buffered accesses.

Essentially, this sort of primitive should be your default and you should only reach for lower level primitives in your application if you have a good reason for it and understand the problems the layers were designed to solve.

toast0 7 months ago | parent [-]

> Sure, but the article is talking about grep, not write() or libc implementations.

Yes, but you said

> In this case, the library that buffers in userspace should set appropriate timers when it first buffers the data

The library that buffers in userspace for grep and tcpdump is almost certainly libc.

Veserv 7 months ago | parent [-]

Okay, I should have said “a” instead of using “the” when there is no clear antecedent allowing it to be interpreted ambiguously in exactly that single sentence which apparently invalidated the fact that I was obviously talking in generalities of API design and implementation.

It did not even occur to me that anybody would even think this was some sort of statement about whatever libc they use on Linux given that I said just “buffered accesses” with no reference to platform or transport channel.

I thought somebody might think I was talking about just writes, so I deliberately wrote accesses.

I thought somebody would make some sort of pedantic statement if I just said “should” so I wrote “should almost always”.

I thought somebody might think I was talking about write() in particular so I deliberately avoided talking about any specific API to head that off.

In my reply I deliberately said “blocking read/wait” instead of select() or epoll() or io_uring or whatever other thing they use these days to avoid such confusion that it was a specific remedy for a specific library or API.

But, alas, here we are. My pedantry was no match for first contact. You will just have to forgive my inability to consider the dire implications of minor ambiguities.