| ▲ | CharlieDigital 20 hours ago | ||||||||||||||||
I've implemented a distributed worker system on top of this paradigm. I used ZMQ to connect nodes and the worker nodes would connect to an indexer/coordinator node that effectively did a `SELECT FROM ORDER BY ASC`. It's easier than you may think and the bits here ended up with probably < 1000 SLOC all told.
Effectively a distributed event loop with the events queued up via a simple SQL query.Dead simple design, extremely robust, very high throughput, very easy to scale workers both horizontally (more nodes) and vertically (more threads). ZMQ made it easy to connect the remote threads to the centralized coordinator. It was effectively "self balancing" because the workers would only re-queue their thread once it finished work. Very easy to manage, but did not have hot failovers since we kept the materialized, "2D" work queue in memory. Though very rarely did we have issues with this. | |||||||||||||||||
| ▲ | ahoka 19 hours ago | parent | next [-] | ||||||||||||||||
Yeah, but that's like doing actual engineering. Instead you should just point to Kafka and say that it's going to make your horrible architecture scale magically. That's how the pros do it. | |||||||||||||||||
| |||||||||||||||||
| ▲ | kerblang 13 hours ago | parent | prev [-] | ||||||||||||||||
Kafka is really not intended to improve on this. Instead, it's intended for very high-volume ETL processing, where a classical message queue delivering records would spend too much time on locking. Kafka is hot-rodding the message queue design and removing guard rails to get more messages thru faster. Generally I say, "Message queues are for tasks, Kafka is for data." But in the latter case, if your data volume is not huge, a message queue for async ETL will do just fine and give better guarantees as FIFO goes. In essence, Kafka is a very specialized version of much more general-purpose message queues, which should be your default starting point. It's similar to replacing a SQL RDBMS with some kind of special NoSQL system - if you need it, okay, but otherwise the general-purpose default is usually the better option. | |||||||||||||||||
| |||||||||||||||||