Remix.run Logo
amluto 2 days ago

I gave this a quick skim, and:

> - Strict SPSC & No CAS: I went with a strict Single-Producer Single-Consumer topology. There are no compare-and-swap loops on the hot path. acquire_tx and acquire_rx are essentially just a load, a mask, and a branch using memory_order_acquire / release.

> - Hybrid Wait Strategy: The consumer spins for a bounded threshold using cpu_relax(), then falls back to a sleep via SYS_futex (Linux) or __ulock_wait (macOS) to prevent CPU starvation.

You can't actually achieve both of these at once, right? In "pure_spin" mode you can write without seq_cst, but in hybrid wait mode you need some seq_cst operation to avoid a race that would cause you to fail to wake the consumer, I think. This is an IMO obnoxious general problem with any sort of lightweight wake operation, and I haven't seen a great solution. I wish there was one, and I imagine it would be doable with only smallish amounts of hardware help or maybe even very clever kernel help. And you can avoid it (at extreme) cost with membarrier(), but I struggle to imagine the use case where it's a win, and it's certainly not a win in cases where you really want to avoid tail latency.