Remix.run Logo
toast0 9 hours ago

I think this is the right approach, but any libc setting automatic timers would lead to a lot of tricky problems because it would change expectations.

I/O errors could occur at any point, instead of only when you write. Syscalls everywhere could be interrupted by a timer, instead of only where the program set timers, or when a signal arrives. There's also a reasonable chance of confusion when the application and libc both set timer, depending on how the timer is set (although maybe this isn't relevant anymore... kernel timer apis look better than I remember). If the application specifically pauses signals for critical sections, that impacts the i/o timers, etc.

There's a need to be more careful in accessing i/o structures because of when and how signals get handled.

Veserv 9 hours ago | parent | next [-]

You will generally only stall indefinitely if you are waiting for new data. So, you will actually handle almost every use case if your blocking read/wait also respects the timeout and does the flush on your behalf. Basically, do it synchronously at the top of your event loop and you will handle almost every case.

You could also relax the guarantee and set a timeout that is only checked during your next write. This still allows unbounded latency, but as long as you do one more write it will flush.

If neither of these work, then your program issues a write and then gets into a unbounded or unreasonably long loop/computation. At which point you can manually flush what is likely the last write your program is every going to make which would be a trivial overhead since that is a single write compared to a ridiculously long computation. That or you probably have bigger problems.

toast0 7 hours ago | parent [-]

Yeah, these are all fine to do, but a libc can really only do the middle one. And then, at some cost.

If you're already using an event loop library, I think it's reasonable for that to manage flushing outputs while waiting for reads, but I don't think any of the utilities in this example do; maybe tcpdump does, but I don't know why grep would.

Veserv an hour ago | parent [-]

Sure, but the article is talking about grep, not write() or libc implementations.

grep buffers writes with no flush timeout resulting in the problem in the article.

grep should probably not suffer from the problem and can use a write primitive/library/design that avoids such problems with relatively minimal extra complexity and dependencies while retaining the performance advantages of userspace buffering.

Most programs (that are minimizing dependencies so can not pull in a large framework, like grep or other simple utilities) would benefit from using such modestly more complex primitives instead of bare buffered writes/reads. Such primitives are relatively easy to use and understand, being largely a drop-in replacement in most common use cases, and resolve most remaining problems with buffered accesses.

Essentially, this sort of primitive should be your default and you should only reach for lower level primitives in your application if you have a good reason for it and understand the problems the layers were designed to solve.

nine_k 9 hours ago | parent | prev [-]

I don't follow. Using a pipe sets an expectation of some amount of asynchronicity, because we only control one end of the pipe. I don't see a dramatic difference between an error occurring because of the process on the other end is having trouble, or because of a timeout handler is trying to push the bytes.

On the reading end, the error may occur at the attempt to read the pipe.

On the writing end, the error may be signaled at the next attempt to write to or close the pipe.

In either case, a SIGPIPE can be sent asynchronously.

What scenario am I missing?

toast0 7 hours ago | parent [-]

> In either case, a SIGPIPE can be sent asynchronously.

My expectation (and I think this is an accurate expecation) is that a) read does not cause a SIGPIPE, read on a widowed pipe returns a zero count read as indication of EOF. b) write on a widowed pipe raises SIGPIPE before the write returns. c) write to a pipe that is valid will not raise SIGPIPE if the pipe is widowed without being read from.

Yes, you could get a SIGPIPE from anywhere at anytime, but unless someone is having fun on your system with random kills, you won't actually get one except immediately after a write to a pipe. With a timer based asynchronous write, this changes to potentially happening any time.

This could be fine if it was well documented and expected, but it would be a mess to add it into the libcs at this point. Probably a mess to add it to basic output buffering in most languages.