Remix.run Logo
Veserv 7 months ago

Sure, but the article is talking about grep, not write() or libc implementations.

grep buffers writes with no flush timeout resulting in the problem in the article.

grep should probably not suffer from the problem and can use a write primitive/library/design that avoids such problems with relatively minimal extra complexity and dependencies while retaining the performance advantages of userspace buffering.

Most programs (that are minimizing dependencies so can not pull in a large framework, like grep or other simple utilities) would benefit from using such modestly more complex primitives instead of bare buffered writes/reads. Such primitives are relatively easy to use and understand, being largely a drop-in replacement in most common use cases, and resolve most remaining problems with buffered accesses.

Essentially, this sort of primitive should be your default and you should only reach for lower level primitives in your application if you have a good reason for it and understand the problems the layers were designed to solve.

toast0 7 months ago | parent [-]

> Sure, but the article is talking about grep, not write() or libc implementations.

Yes, but you said

> In this case, the library that buffers in userspace should set appropriate timers when it first buffers the data

The library that buffers in userspace for grep and tcpdump is almost certainly libc.

Veserv 7 months ago | parent [-]

Okay, I should have said “a” instead of using “the” when there is no clear antecedent allowing it to be interpreted ambiguously in exactly that single sentence which apparently invalidated the fact that I was obviously talking in generalities of API design and implementation.

It did not even occur to me that anybody would even think this was some sort of statement about whatever libc they use on Linux given that I said just “buffered accesses” with no reference to platform or transport channel.

I thought somebody might think I was talking about just writes, so I deliberately wrote accesses.

I thought somebody would make some sort of pedantic statement if I just said “should” so I wrote “should almost always”.

I thought somebody might think I was talking about write() in particular so I deliberately avoided talking about any specific API to head that off.

In my reply I deliberately said “blocking read/wait” instead of select() or epoll() or io_uring or whatever other thing they use these days to avoid such confusion that it was a specific remedy for a specific library or API.

But, alas, here we are. My pedantry was no match for first contact. You will just have to forgive my inability to consider the dire implications of minor ambiguities.