Remix.run Logo
BiraIgnacio 4 days ago

It was created to teach me the concept of love-hate relationships

fifilura 3 days ago | parent [-]

I wanted to write a comment on this topic, but after several tries this thread is where I ended up because it describes my sentiment as well.

The arguments in the article are very compelling. But as soon as you choose Kafka you realize the things you hate.

Many of the reasons are stupid things - like it uncovers otherwise unimportant bugs in your client code. Or that it just makes experimenting a hassle because it enforces poking around in lots of different places to do something. Or that writing and maintaining the compulsory integration test takes weeks of your time.

Sure - you can replay your data - but not until you have fixed all the issues for that special case in your receiving service.

I think maybe my main gripe (for us) was that it was a difficult to get an understanding what is actually inside your pipe. Much easier to have that in a solid state in s3?

At the end of they day you get annoyed because it slows you down. In particular when you are a small localized team.

physicles 3 days ago | parent [-]

Totally agree with this. I’ll add that replaying your data needs special tooling to 1) find the correct offsets on each topic, and 2) spin up whatever daemon will consume that data out-of-band from normal processing, and shut it down when completed.

I don’t remember where I read this, but someone made the observation that writing a stream processing system is about 3x harder than writing a batch system, exactly for all the reasons you mentioned. I’m looking at replacing some of our Kafka usage with a clickhouse table that’s ordered and partitioned by insertion time, because if I want to do stuff with that data stream, at least I can do a damn SQL query.

fifilura 3 days ago | parent [-]

Yes I'll happily extend that to 10x more difficult.

At least compared to building a batched pipeline with SQL. I think you should really think hard whether you really need a streaming pipeline. And even if you find that you do, it may be worthwhile to make a batched pipeline as your first implementation.

I did exactly what you describe in my previous job. In the beginning with reluctance from our architects who wanted to keep banging the dead horse and did not understand the power of SQL "SQL is not real programming, engineers write java" (ok maybe I deserve a straw-man yellow card here, they don't deserve all of that). But I think they understood after a while.

With AWS Athena and Airflow. Good luck, consider me your distant moral support.