As far as I know, the last approach is the only way to implement efficient lock-free ring-buffer

The middle approach is the only one that is not lock-free.

The first approach is lock-free, but as the author says, it wastes an element.

But here's the thing. If your element is a character, and your buffer size is, say, 256 bytes, and you are using 8-bit unsigned characters for indices, the one wasted byte is less than one percent of your buffer space, and also is compensated for by the simplicity and reduced code size.

▲

fullstop 3 days ago | parent [-]

I've used the "Waste an element" one for ages on microcontrollers where I don't want to deal with the overhead in an ISR.

	▲	zephen 3 days ago \| parent [-]
		Agreed. The article author claims that the "don't waste an element" code is also more efficient, but that claim seems to be based on a hard-on about the post-increment operator, rather than any kind of dive into the cyclometric complexity, or even, y'know, just looking at the assembler output from the compiler.

▲

mrcode007 4 days ago | parent | prev [-]

There is one more way that is truly lock free. Most lock free implementations relying on atomic compare and swap instructions are not lock free afaik; they have a lock on the cache line in the CPU (in a way you go away from global lock to many distributed locks).

There is one more mechanism that allows implementing ring buffers without having to compare head and tail buffers at all (and doesn’t rely on counters or empty/full flags etc) that piggybacks on the cache consistency protocol

▲

wat10000 4 days ago | parent | next [-]

Those hardware-level locks are typically not considered because they work quite differently. A standard software mutex can cause other threads to block indefinitely if, for example, the thread holding the mutex gets preempted for a long time. "Lock free" isn't really about the locks, it's about a guarantee that the system makes progress.

In this sense, the hardware locks used for atomic instructions don't really count, because they're implemented such that they can only be held for a brief, well defined time. There's no equivalent to suspending a thread while it holds a lock, causing all other threads to wait for an arbitrary amount of time.

▲

dooglius 4 days ago | parent | prev | next [-]

That's not how "lock free" is defined/used. If you are considering the MESI M state to be a "lock" then you also have to grant that any write instruction is a "lock".

	▲	mrcode007 2 days ago \| parent [-]
		In fact this a crux of the problem in low latency code and there are ways to combat this. I know there is an academic wait-free and lock-free definition but folks use those often incorrectly as a slogan that something is magically better because it’s „lockfree”. Imagine how _you_ would implement a read-modify-write atomic in the CPU and why E stands for exclusive (sort of like exclusive in a mutex)

▲

spockz 4 days ago | parent | prev [-]

Interesting! Do you know of an example implementation of this?

	▲	mrcode007 2 days ago \| parent [-]
		Yes. [1] has background, [2] has the implementation (fig 2. pseudocode). Since you understood my comment I trust you can figure out the rest :) it’s a very neat trick. [1]https://www.microsoft.com/en-us/research/publication/concurr... [2]https://arxiv.org/pdf/1012.1824