Remix.run Logo
arter45 8 hours ago

> but I’d love to read more about when to know it’s time to replace my synchronous inter service http requests with a queue. What metrics should I consider and what are the trade offs. I’ve learned some answers to this question over time, but these guys are theoretically message queue experts. I’d love to learn about more things to look out for.

Not OP but I have some background on this.

An Erlang loss system is like a set of phone lines. Imagine a special call center where you have N operators, each of which takes calls, talks for some time (serving the customer) and hungs up. Unlike many call centers, however, they don’t keep you in line. Therefore, if all operators are busy the system hungs up and you have to explicitly call again. This is somewhat similar to a server with N threads.

Let's assume N=3.

Under common mathematical assumptions (constant arrival rate, time between arrivals modeled by a Poisson distribution, exponential service time) you can define:

1) “traffic intensity” (rho) has the ratio between arrival time and service time (intuitively, how “heavy” arrivals are with respect to “departures”)

2) the blocking probability is given by the Erlang B formula (sorry, not easy to write here) for parameters N (number of threads) and rho (traffic intensity). Basically, if traffic intensity = 1 (arrival rate = service rate), the blocking probability is 6.25%. If service rate is twice the arrival rate, this drops to 1% approximately. If service rate is 1/10 of the arrival rate, the blocking probability is 73.3%.

I will try to write down part 2 when I find some time.

EDIT - Adding part 2

So, let's add a buffer. We said we have three threads, right? Let's say the system can handle up to 6 requests before dropping, 1 processed by each thread plus an additional 3 buffered requests. Under the same distribution assumptions, this is known as a M/M/3/6 queue.

Some math crunching under the previous service and arrival rate scenarios:

- if service = arrival time, blocking probability drops to 2%. Of course there is now a non-zero wait probability (close to 9%).

- if service = twice the arrival time, blocking probability is 0.006% and there is a 1% wait probability.

- if service = 1/10 of the arrival time, blocking probability is 70%, waiting probability is 29%.

This means that a buffer reduces request drops due to busy resources, but also introduces a waiting probability. Pretty obvious. Another obvious thing is that you need additional memory for that queue length. Assuming queue length = 3, and 1 KB messages, you need 3 KB of additional memory.

A less obvious thing is that you are adding a new component. Assuming "in series" behavior, i.e. requests cannot be processed when the buffer system is down, this decreases overall availability if the queue is not properly sized. What I mean is that, if the system crashes when more than 4 KB of memory are used by the process, but you allow queue sizes up to 3 (3 KB + 3 KB = 6 KB), availability is not 100%, because in some cases the system accepts more requests than it can actually handle.

An even less obvious thing is that things, in terms of availability, change if you consider server and buffer as having distinct "size" (memory) thresholds. Things get even more complicated if server and buffer are connected by a link which itself doesn't have 100% availability, because you also have to take into account the link unavailability.