Remix.run Logo
itslennysfault 21 hours ago

What's wrong with kafka or what WILL BE wrong with kafka?

PeterCorless 4 hours ago | parent [-]

So much that we presume in the modern cloud wasn't a given when Apache Kafka was first released in 2011.

kevstev wrote just above about Kafka being written to run on spinning disks (HDDs), while Redpanda was written to take advantage of the latest hardware (local NVMe SSDs). He has some great insights.

As well, Apache Kafka was written in Java, back in an era when you were weren't quite sure what operating system you might be running on. For example, when Azure first launched they had a Windows NT-based system called Windows Azure. Most everyone else had already decided to roll Linux. Microsoft refused to budge on Linux until 2014, and didn't release its own Azure Linux until 2020.

Once everyone decided to roll Linux, the "write once run everywhere" promise of Java was obviated. But because you were still locked into a Java Virtual Machine (JVM) your application couldn't optimize itself to the underlying hardware and operating system you were running on.

Redpanda, for example, is written in C++ on top of the Seastar framework (seastar.io). The same framework at the heart of ScyllaDB. This engine is a thread-per-core shared-nothing architecture that allows Redpanda to optimize performance for hardware utilization in ways that a Java app can only dream of. CPU utilization, memory usage, IO throughput. It's all just better performance on Redpanda.

It means that you're actually getting better utility out of the servers you deploy. Less wasted / fallow CPU cycles — so better price-performance. Faster writes. Lower p99 latencies. It's just... better.

Now, I am biased. I work at Redpanda now. But I've been a big fan of Kafka since 2015. I am still bullish on data streaming. I just think that Apache Kafka, as a Java-based platform, needs some serious rearchitecture,

Even Confluent doesn't use vanilla Kafka. They rewrote their own engine, Kora. They claim it is 10x faster. Or 30x faster. Depending on what you're measuring.

1. https://www.confluent.io/confluent-cloud/kora/

2. https://www.confluent.io/blog/10x-apache-kafka-elasticity/