As someone who myself worked on a hobby-level Rust based Kafka alternative that used Raft for metadata coordination for ~8 months: nice work!

Wasn't immediately clear to me if the data-plane level replication also happens through Raft or something home-rolled? Getting consistency and reliability right with something home-rolled is challenging.

Notes:

- Would love to see it in an S3-backed mode, either entirely diskless like WarpStream or as tiered storage.

- Love the simplified API. If possible, adding a Kafka compatible API interface is probably worth it to connect to the broader ecosystem.

Best of luck!

▲ seanhunter 9 hours ago | parent | next [-]

It says on the github page

   " It provides fault-tolerant streaming with automatic leadership rotation, segment-based partitioning, and Raft consensus for metadata coordination."

So I guess that's a "yes" to raft?

▲

zbentley 9 hours ago | parent [-]

GP asked about data plane consensus, not metadata/control plane.

	▲	EdwardDiego 8 hours ago \| parent [-]
		They asked about data plane replication - e.g., leader -> followers. Unless I misunderstood them.

▲ nubskr 8 hours ago | parent | prev [-]

Hi, the creator here, I think its a good idea to have S3 backed storage mode, its kinda tricky to do it for the 'active' block which we are currently writing to, but totally doable for historical data.

Also about the kafka API, I tried to implement that earlier, I had a sort of `translation` layer for that earlier, but it gets pretty complicated to maintain that because kafka is offset based, while walrus is message based.

▲

EdwardDiego 8 hours ago | parent [-]

TBH I don't think anyone can utilise S3 for the active segment, I didn't dig into Warpstream too much, but I vaguely recall they only offloaded to S3 once the segment was rolled.

▲

zellyn 6 hours ago | parent [-]

The Developer Voices interview where Kris Jenkins talks to Ryan Worl is one of the best, and goes into a surprising amount of detail: https://www.youtube.com/watch?v=xgzmxe6cj6A

tl;dr they write to s3 once every 250ms to save costs. IIRC, they contend that when you keep things organized by writing to different files for each topic, it's the Linux disk cache being clever that turns the tangle of disk block arrangement into a clean view per file. They wrote their own version of that, so they can cheaply checkpoint heavily interleaved chunks of data while their in-memory cache provides a clean per-topic view. I think maybe they clean up later async, but my memory fails me.

I don't know how BufStream works.

The thing that really stuck with me from that interview is the 10x cost reduction you can get if you're willing and able to tolerate higher latency and increased complexity and use S3. Apparently they implemented that inside Datadog ("Labrador" I think?), and then did it again with WarpStream.

I highly recommend the whole episode (and the whole podcast, really).

▲

nubskr 6 hours ago | parent [-]

s3 charges per 1,000 Update requests, not sure how it's sustainable to do it every 250ms tbh, especially in multi tenant mode where you can have thousands of 'active' blocks being written to

	▲	zellyn 5 hours ago \| parent [-]
		Guess it beats doing it every 250ms for every topic…