This is surprisingly basic knowledge for ending up on the front page.

It’s a good intro, but I’d love to read more about when to know it’s time to replace my synchronous inter service http requests with a queue. What metrics should I consider and what are the trade offs. I’ve learned some answers to this question over time, but these guys are theoretically message queue experts. I’d love to learn about more things to look out for.

There are also different types of queues/exchanges and this is critical depending on the types of consumer or consumers you have. Should I use direct, fan out, etc?

The next interesting question is when should I use a stream instead of a queue, which RabbitMQ also supports.

My advice, having just migrated a set of message queues and streams from AWS(AvtiveMQ) to RabbitMQ is think long and hard before you add one. They become a black box of sorts and are way harder to debug than simple HTTP requests.

Also, as others have pointed out, there are other important use cases for queues which come way before microservice comms. Async processing to free up servers is one. I’m surprised none of these were mentioned.

▲

Aurornis 11 hours ago | parent | next [-]

> This is surprisingly basic knowledge for ending up on the front page.

Nothing wrong with that! Hacker News has a large audience of all skill levels. Well written explainers are always good to share, even for basic concepts.

▲

p1anecrazy 10 hours ago | parent | next [-]

In principle, I agree, but “a message queue is… a medium through which data flows from a source system to a destination system” feels like a truism.

▲

sigbottle 9 hours ago | parent [-]

For me, I've realized I often cannot possibly learn something if I can't compare it to something prior first.

In this case, as another user mentioned, the decoupling use case is a great one. Instead of two processes/API directly talking, having an intermediate "buffer" process/API can save you headache

	▲	nyrikki 8 hours ago \| parent [-]
		To add to this, The concept of connascence, and not coupling is what I find more useful for trade off analysis. Synchronous connascence means that you only have a single architectural quanta under Neil Ford’s terminology. As Ford is less religious and more respectful of real world trade offs, I find his writings more useful for real world problems. I encourage people to check his books out and see if it is useful. It was always hard to mention connascence as it has a reputation of being ivory tower architect jargon, but in a distributed system world it is very pragmatic.

▲

coronapl 10 hours ago | parent | prev [-]

Agree! In fact, I would appreciate more well written articles explaining basic concepts on the front page of Hacker News. It is always good to revisit some basic concepts, but it is even better to relearn them. I am surprised by how often I realize that my definition of a concept is wrong or just superficial.

	▲	SAI_Peregrinus an hour ago \| parent [-]
		Also it's nice to have a set of well-written explainers for when someone asks about a concept.

▲

chasil 10 hours ago | parent | prev | next [-]

This has more depth on System V/POSIX IPC, and a youtube video.

https://www.softprayog.in/programming/interprocess-communica...

Fun fact: IPC was introduced in "Colombus UNIX."

https://en.wikipedia.org/wiki/CB_UNIX

▲

arter45 8 hours ago | parent | prev | next [-]

> but I’d love to read more about when to know it’s time to replace my synchronous inter service http requests with a queue. What metrics should I consider and what are the trade offs. I’ve learned some answers to this question over time, but these guys are theoretically message queue experts. I’d love to learn about more things to look out for.

Not OP but I have some background on this.

An Erlang loss system is like a set of phone lines. Imagine a special call center where you have N operators, each of which takes calls, talks for some time (serving the customer) and hungs up. Unlike many call centers, however, they don’t keep you in line. Therefore, if all operators are busy the system hungs up and you have to explicitly call again. This is somewhat similar to a server with N threads.

Let's assume N=3.

Under common mathematical assumptions (constant arrival rate, time between arrivals modeled by a Poisson distribution, exponential service time) you can define:

1) “traffic intensity” (rho) has the ratio between arrival time and service time (intuitively, how “heavy” arrivals are with respect to “departures”)

2) the blocking probability is given by the Erlang B formula (sorry, not easy to write here) for parameters N (number of threads) and rho (traffic intensity). Basically, if traffic intensity = 1 (arrival rate = service rate), the blocking probability is 6.25%. If service rate is twice the arrival rate, this drops to 1% approximately. If service rate is 1/10 of the arrival rate, the blocking probability is 73.3%.

I will try to write down part 2 when I find some time.

EDIT - Adding part 2

So, let's add a buffer. We said we have three threads, right? Let's say the system can handle up to 6 requests before dropping, 1 processed by each thread plus an additional 3 buffered requests. Under the same distribution assumptions, this is known as a M/M/3/6 queue.

Some math crunching under the previous service and arrival rate scenarios:

- if service = arrival time, blocking probability drops to 2%. Of course there is now a non-zero wait probability (close to 9%).

- if service = twice the arrival time, blocking probability is 0.006% and there is a 1% wait probability.

- if service = 1/10 of the arrival time, blocking probability is 70%, waiting probability is 29%.

This means that a buffer reduces request drops due to busy resources, but also introduces a waiting probability. Pretty obvious. Another obvious thing is that you need additional memory for that queue length. Assuming queue length = 3, and 1 KB messages, you need 3 KB of additional memory.

A less obvious thing is that you are adding a new component. Assuming "in series" behavior, i.e. requests cannot be processed when the buffer system is down, this decreases overall availability if the queue is not properly sized. What I mean is that, if the system crashes when more than 4 KB of memory are used by the process, but you allow queue sizes up to 3 (3 KB + 3 KB = 6 KB), availability is not 100%, because in some cases the system accepts more requests than it can actually handle.

An even less obvious thing is that things, in terms of availability, change if you consider server and buffer as having distinct "size" (memory) thresholds. Things get even more complicated if server and buffer are connected by a link which itself doesn't have 100% availability, because you also have to take into account the link unavailability.

▲

SpaceManNabs 8 hours ago | parent | prev [-]

I think the article would be a little bit more useful to non-beginners if it included an update on the modern landscape of MQs. Are people still using apache kafka lol?

it is a fine enough article as it is though!

	▲	deepsun 2 hours ago \| parent [-]
		Kafka is a distributed log system. Yes, people use Kafka as a message queue, but it's often a wrong tool for the job, it wasn't designed for that.