▲ | fifilura 3 days ago | |||||||
I wanted to write a comment on this topic, but after several tries this thread is where I ended up because it describes my sentiment as well. The arguments in the article are very compelling. But as soon as you choose Kafka you realize the things you hate. Many of the reasons are stupid things - like it uncovers otherwise unimportant bugs in your client code. Or that it just makes experimenting a hassle because it enforces poking around in lots of different places to do something. Or that writing and maintaining the compulsory integration test takes weeks of your time. Sure - you can replay your data - but not until you have fixed all the issues for that special case in your receiving service. I think maybe my main gripe (for us) was that it was a difficult to get an understanding what is actually inside your pipe. Much easier to have that in a solid state in s3? At the end of they day you get annoyed because it slows you down. In particular when you are a small localized team. | ||||||||
▲ | physicles 3 days ago | parent [-] | |||||||
Totally agree with this. I’ll add that replaying your data needs special tooling to 1) find the correct offsets on each topic, and 2) spin up whatever daemon will consume that data out-of-band from normal processing, and shut it down when completed. I don’t remember where I read this, but someone made the observation that writing a stream processing system is about 3x harder than writing a batch system, exactly for all the reasons you mentioned. I’m looking at replacing some of our Kafka usage with a clickhouse table that’s ordered and partitioned by insertion time, because if I want to do stuff with that data stream, at least I can do a damn SQL query. | ||||||||
|