Remix.run Logo
erulabs 4 days ago

Kafka's ability to ingest the firehose and present it as a throttle-able consumable to many different applications is great. If you're thinking "just use a database", it's worth noting that SQL databases are _not well suited_ to drinking from a firehose of writes, and that distributed SQL in 2012 was not a thing. Kafka was one of the first systems that fully embraced the dropping of the C from CAP theorem, which was a big step forward for web applications at scale. If you bristle at that, know that using read-replicas of your postgres database present the same correctness problems.

These days though, unless I was at Fortune 100 scale, I'd absolutely turn to Redis Cluster Streams instead. So much simpler to manage and so much cheaper to run.

Also I like Kafka because I met two pretty Russian girls in San Francisco a decade back and the group we were in played a game where we described what the company we worked for did in the abstract, and then tried to guess the startup. They said "we write distributed streaming software", I guessed "confluent" immediately. At the time confluent was quite new and small. Fun night. Fun era.

atombender 3 days ago | parent | next [-]

For a long time I've wondered if we could just invent an extension for Postgres that allow physically ordered, append-only tables.

The main two things that makes Postgres less suitable for Kafka-type logs is that tables aren't very efficient for sequentially ordered data, and that deletion incurs bloat until vacuumed. You could solve both by providing a new table engine (table access method), although I'm not sure you can control heap storage placement to the degree desired for a physically ordered table. But you could also do a lot of tricks to make it delete faster (append only means no updates; just prune from the head without MVCC when provably safe against concurrent reads?) and make filtering faster.

Kafka is of course more than that, but I bet you can get quite far with this.

dxxvi 4 days ago | parent | prev | next [-]

> turn to Redis Cluster Streams instead. So much simpler to manage and so much cheaper to run

I don't have any experience with Redis Cluster Streams. Could you please tell us how it is simpler to manage? IMO, installing and managing a Kafka cluster in a non Fortune 100 scale is simple enough: run 1 java command for zookeeper, run another java command for a broker (with recent version of Kafka, zookeeper is not needed anymore). The configuration files are not very simple but not very complicated either. When we have another machine, we can run another broker on it.

Redis Cluster Streams is cheaper to run because it's written in C, doesn't need a VN to run? Or because its messages are stored in RAM not SSD?

erulabs 3 days ago | parent | next [-]

I haven’t used Kafka since the change to remove zookeeper, it’s likely they’re more or less on par now. Redis gets a win because most shops already have Redis, it’s already trusted and installed, just waiting for its first XADD.

3 days ago | parent | prev [-]
[deleted]
enether 3 days ago | parent | prev | next [-]

Love the story!

> Kafka was one of the first systems that fully embraced the dropping of the C from CAP theorem, which was a big step forward for web applications at scale.

Could you expand on this - when does it drop C? Are you referring to cases where you write to Kafka without waiting for all replicas to acknowledge the write? (acks=1)

And why was it a big step - what other systems didn't embrace dropping the C?

erulabs 3 days ago | parent [-]

Well at the time (and this is still largely true), people were very insistent on ACID compliance in databases. Obviously this made sense of many applications, but became a bottleneck at huge scale. Being able to be eventually consistent became a golden feature. It was worked around by using eg read replicas in production, as SQL replication breaks the Correctness in favor of the Availability. Kafka’s “acks=1” is part of the story yes, but simply writing events to be eventually processed also accomplishes “dropping correctness”.

Native support for dropping correctness in SQL is tricky, see Transaction Isolation Levels, but I mostly mean in overall web architecture, rather than specifically in one database or the other.

jiggawatts 3 days ago | parent | prev | next [-]

> that SQL databases are _not well suited_ to drinking from a firehose of writes

Now I’m wondering if we’re all overthinking this when we could just use rendezvous hashing and a bunch of database servers with a heap table called “eventlog” and be done with it…

betaby 4 days ago | parent | prev [-]

> Kafka's ability to ingest the firehose and present it as a throttle-able consumable to many different applications is great.

I sue Kafka precisely for that. Redis Cluster Streams have AOF persistence logs as I see from the doc. How stable it is?