Remix.run Logo
riyaneel 2 days ago

You're right on the MPSC point, the ADR overstates it. Aeron's claim-based approach uses a single fetch_add per producer, no retry loop. The real constraints are bounded message size upfront and a caretaker thread for reclamation, not a CAS retry. The wording needs fixing. On the SPSC counter argument, Tachyon already does most of what you describe: inline headers, head/tail on separate 128-byte cache lines, cached tail only reloaded on apparent fullness, tail writes amortized across 32 messages. If you have numbers comparing the single-counter approach against this specific layout I'd be genuinely curious.

nly 2 days ago | parent [-]

The main issue with dual counters is that most of the time, in low latency usecases, your consumer is ~1 message behind the producer.

This means your consumer isn't getting a lot of benefit from caching the producers position. The queue appears empty the majority of the time and it has to re-load the counter (causing it to claim the cacheline).

Meanwhile the producer goes to write message N+1 and update the counter again, and has to claim it back (S to M in MESI), when it could have just set a completion flag in the message header that the consumer hasn't touched in ages (since the ring buffer last lapped). And it's just written data to this line anyway so already has it exclusively.

So when your queue is almost always empty, this counter is just another cache line being ping ponged between cores.

This gets back to Aeron. In Aerons design the reader can get ahead of the writer and it's safe.

riyaneel 2 days ago | parent [-]

Fair point on the head cache line. Tachyon's target is cross-language zero-copy IPC, not squeezing the last nanosecond out of a pure C++ ring. Different tradeoff.