Remix.run Logo
gooob 21 hours ago

wait what's wrong with kafka?

Boxxed 21 hours ago | parent | next [-]

I was in the midst of writing a snarky reply and then realized my actual issue with Kafka is that people reach for it way too often and use it in ways that don't really make sense.

Kind of like how people use docker for evrything, when what you really should be doing is learn how to package software.

stackskipton 21 hours ago | parent [-]

Ops here, Docker is packaging software.

Agree on the Kafka thing though. I've seen so many devs trip over Kafka topics, partitions and offsets when their throughput is low enough that RabbitMQ would do fine.

marcosdumay 21 hours ago | parent [-]

No, docker is a software for packaging systems.

The people distributing software should shut them damn up about how the rest of the system it runs in is configured. (But not you, your job is packaging full systems.)

That said, it seems to me that this is becoming less of a problem.

kevstev 20 hours ago | parent | prev | next [-]

Nothing inherently wrong with the core product IMHO. The issue is more with Confluent, who have been constantly swinging from hot buzzword to hot buzzword for the last few years in search of growth. Confluent cloud is very expensive, and you still have to deal with a surprising amount of scaling headaches. I have people I consider friends that work there, so I don't want to go too deep into their various missteps, but the Kafka ecosystem has been largely stagnant outside of getting rid of Zookeeper and simplifying operations/deployment. There have been some decent quality of life fixes, but the platform is very expensive, yet if you are really all-in on Kafka, you would be insane to not get support from Confluent- it can break in surprising ways.

So you are stuck with some really terrible tradeoffs- Go with Confluent Cloud, pay a fortune, and still likely have some issues to deal with. Or you could go with Confluent Platform, still have to pay people to operate it, while Confluent the company focuses most of their attention on Cloud and still charges you a fortune. Or you could just go completely OS and forgo anything Confluent and risk being really up the river when something inevitably breaks, or you have to learn the hard way that librdkafka has poor support for a lot of the shiny features discussed in the release notes.

Redpanda has surpassed them from a technical quality perspective, but Kafka has them beat on the ecosystem and the sheer inertia of moving from one platform to another. Kafka for example was built in a time of spinning rust hard disks, and expects to be run on general purpose compute nodes, where Redpanda will actually look at your hardware and optimize the number of threads its spawns for the box it is on- assuming it is going to be the only real app running there, which is true for anything but a toy deployment.

This is my experience from running platform teams and being head of messaging at multiple companies.

itslennysfault 21 hours ago | parent | prev | next [-]

What's wrong with kafka or what WILL BE wrong with kafka?

PeterCorless 4 hours ago | parent [-]

So much that we presume in the modern cloud wasn't a given when Apache Kafka was first released in 2011.

kevstev wrote just above about Kafka being written to run on spinning disks (HDDs), while Redpanda was written to take advantage of the latest hardware (local NVMe SSDs). He has some great insights.

As well, Apache Kafka was written in Java, back in an era when you were weren't quite sure what operating system you might be running on. For example, when Azure first launched they had a Windows NT-based system called Windows Azure. Most everyone else had already decided to roll Linux. Microsoft refused to budge on Linux until 2014, and didn't release its own Azure Linux until 2020.

Once everyone decided to roll Linux, the "write once run everywhere" promise of Java was obviated. But because you were still locked into a Java Virtual Machine (JVM) your application couldn't optimize itself to the underlying hardware and operating system you were running on.

Redpanda, for example, is written in C++ on top of the Seastar framework (seastar.io). The same framework at the heart of ScyllaDB. This engine is a thread-per-core shared-nothing architecture that allows Redpanda to optimize performance for hardware utilization in ways that a Java app can only dream of. CPU utilization, memory usage, IO throughput. It's all just better performance on Redpanda.

It means that you're actually getting better utility out of the servers you deploy. Less wasted / fallow CPU cycles — so better price-performance. Faster writes. Lower p99 latencies. It's just... better.

Now, I am biased. I work at Redpanda now. But I've been a big fan of Kafka since 2015. I am still bullish on data streaming. I just think that Apache Kafka, as a Java-based platform, needs some serious rearchitecture,

Even Confluent doesn't use vanilla Kafka. They rewrote their own engine, Kora. They claim it is 10x faster. Or 30x faster. Depending on what you're measuring.

1. https://www.confluent.io/confluent-cloud/kora/

2. https://www.confluent.io/blog/10x-apache-kafka-elasticity/

itsanaccount 20 hours ago | parent | prev [-]

https://en.wikipedia.org/wiki/Enshittification is helpful if you arent aware of how late stage capitalism works

philipallstar 19 hours ago | parent [-]

Late stage of what?