I'm still convinced the vast majority of kafka implementations could be replaced with `SELECT * FROM mytable ORDER BY timestamp ASC`

▲

Romario77 20 hours ago | parent | next [-]

pull vs push. Plus if you start storing the last timestamp so you only select the delta and if you start sharding your db and dealing with complexities of having different time on different tables/replication issues it quickly becomes evident that Kafka is better in this regard.

But yeah, for a lot of implementations you don't need streaming. But for pull based apps you design your architecture differently, some things are a lot easier than it is with DB, some things are harder.

▲

ahoka 19 hours ago | parent [-]

Funny you mention that, because Kafka consumers actually pull messages.

▲

politelemon 15 hours ago | parent | next [-]

What is the reason for using Kafka then, sorry if I'm missing something fundamental.

	▲	pram 10 hours ago \| parent [-]
		A Kafka consumer does a lot of work coordinating distributed clients in a group, managing the current offset, balancing the readers across partitions, etc which is native broker functionality. Saying you can replace it all with a simple JDBC client or something isn't true (if you need that stuff!)

▲

ycombinatrix 12 hours ago | parent | prev [-]

Not by busy waiting in a loop on a database query though.

▲

fatal94 21 hours ago | parent | prev | next [-]

Sure, if you're working on a small homelab with minimal to no processing volume.

The second you approach any kind of scale, this falls apart and/or you end up with a more expensive and worse version of Kafka.

▲

devnull3 20 hours ago | parent | next [-]

I think there is a wide spectrum between small-homelab and google scale.

I was surprised how far sqlite goes with some sharding on modern SSDs for those in-between scale services/saas

▲

fatal94 20 hours ago | parent [-]

What you're doing is fine for a homelab, or learning. But barring any very specific reason other than just not liking Kafka, its bad. The second that pattern needs to be fanned out to support even 50+ producers/consumers, the overhead and complexity needed to manage already-solved problems becomes a very bad design choice.

Kafka already solves this problem and gives me message durability, near infinite scale out, sharding, delivery guarantees, etc out of the box. I do not care to develop, reshard databases or production-alize this myself.

▲

NewJazz 18 hours ago | parent | next [-]

Some people don't and won't need 50+ producers/consumers for a long while, if ever. Rewriting the code at that point may be less costly than operating Kafka in the interim. Kafka is also has a higher potential for failure than sqlite.

▲

fatal94 17 hours ago | parent | next [-]

Ofc, and not everybody needs or cares for all the features Kafka has. Then use another known and tested messaging system. Use NATS or ZMQ. Or any cloud native pubsub system

My main point is, I have zero interest in creating novel solutions to a solved problem. It just artificially increases the complexity of my work and the learning curve for contributors.

▲

umanwizard 15 hours ago | parent | prev [-]

Okay, then those people don’t have to use Kafka. What is your point?

	▲	NewJazz 15 hours ago \| parent [-]
		I was responding to someone who was responding to someone that wasn't using Kafka telling them to use Kafka. What's yours?

▲

CyberDildonics 18 hours ago | parent | prev [-]

sqlite can do 40,000 transactions per second, that's going to be a lot more than 'homelab' (home lab).

Not everything needs to be big and complicated.

▲

raverbashing 20 hours ago | parent | prev [-]

"Any kind of scale" No, there's a long way of better and more straightforward solutions than the simple SELECT

(SELECT * from EVENTS where TIMESTAMP > LAST_TS LIMIT 50) for example

▲

hawk_ 15 hours ago | parent | prev | next [-]

Yes but try putting that on your CV.

▲

devnull3 21 hours ago | parent | prev [-]

That is exactly what I am doing with sqlite.

Have a table level seqno as monotonically increasing number stamped for every mutation. When a subscriber connects it asks for rows > Subscriber's seqno-last-handled.