As the interviewee I'd use such a question to demonstrate my knowledge of the topic. For this question I'd point out that thread-per-CPU w/ async I/O designs w/ maximum live connections and clever connection pool acceptance & eviction policies, and limited buffering, together intrinsically limit oversubscription, but I would still talk extensively about health monitoring and external circuit breaking, as well as the use of 429/whatever and/or flow control (where available) at the protocol level to express backpressure. I would then use this to harp on the evils of thread-per-client designs, and also why they happen, as well as the various alternatives.

Make the interviewer tell you when they've had enough and change topics.

▲

iamcreasy 5 days ago | parent | next [-]

I beg you to write an article expanding on this points. I'll pay to read this article.

▲

cryptonector 5 days ago | parent [-]

Really!

I've written about this in my comments here...

Here's a brief summary:

- typical thread-per-client programming is terribly wasteful because it needs large stacks that must be able to grow (within reason), and this leads programmers to smear client state all over the stack, which then means that the memory and _cache_ footprint of per-client state is huge even though the state is highly compressible, and this is what reduces efficiency when you want to C10K (serve 10,000 clients), and this is what led to C10K techniques in the 90s

- you can most highly compress said per-client program state by using continuation passing style (CPS) async I/O programming, either hand-coded or using modern async functions in languages that have them -- this approach tends to incentivize the programmer to compress program state into a much smaller structure than an execution stack, which therefore greatly reduces the per-client memory and _cache_ footprint of the program

Note that reducing the memory and cache footprint of a program also reduces its memory bandwidth footprint, and increases cache locality and reduces latency. This means you can server more clients with the same hardware. THIS is the point of C10K techniques.

All other techniques like fibers and green threads sit on the spectrum from hand-coded CPS async I/O programming to thread-per-client sequential programming. You get to pick how efficient your code is going to be.

Now, when you apply C10K techniques you get to have one thread-per-CPU -- look ma'! no context switches -- which also improves latency and efficiency, naturally. But there's another neat thing to thread-per-CPU: it becomes easier to manage the overall health of the service, at least per-CPU, because you can now manage all your connected clients, and so you can have eviction policies for connections, and admittance policies too. In particular you can set a maximum number of connections, which means that you can set maxima that only slightly oversubscribe the hardware's capabilities.

Otherwise [and even if you do the thread-per-CPU thing, though it's less important to have circuit breakers if you do] you must have some way to measure the health of your service, and you need to monitor it, and you need your monitor to be able to "break the circuit" by telling your service to start rejecting new work. This is where HTTP status 429 and similar come into play -- it's just a way to express backpressure to clients, though flow control will also do, if you have that available to exercise. You'll still need to be able to monitor load, latencies, and throughput for thread-per-CPU services, naturally, so you know when you need to add HW. And of course you'll want to build services you can scale horizontally as much as possible so that adding hardware is easy, though too you need to be able to find and stop pathological clients (dealing with DoS and DDoS almost always requires components external to your services).

Make sure all middleware can respond appropriately to backpressure, including having their own circuit breakers, and you have a pretty resilient system -- one that under extreme pressure will be able to shed load and continue to function to some degree.

You'll need to be able to express client priorities (so that you can keep servicing part of your load) and quotas (so that pathological high-priority clients don't DoS you).

There's much more to say, naturally.

BTW, I checked today and LLMs seem to know all this stuff, and more than that they'll be able to point you to frameworks, blogs, and lots of other docs. That said, if you don't prompt them to tell you about the thread-per-CPU stuff, they won't.

Keep in mind that C10K techniques are expensive in terms of developer time, especially for junior developers.

▲

belZaah 5 days ago | parent | next [-]

There’s also a systems level rationale to this. Without good isolation, you’ll get a feedback loop: threads start to step on each other’s toes. This leads to slower response times. Which, at a given request pressure, leads to more parallel threads. Which slows them down even more. If there’s a brief peak in pressure, that drops the response time below a critical point, such a system will never recover and you’ll get a server furiously computing without an apparent reason only to behave normally after a restart.

	▲	cryptonector 4 days ago \| parent \| next [-]
		Yes, and thus circuit breakers. By sizing offered capacity to some factor of actual capacity you can limit the effects of too much demand to causing backpressure naturally (rejecting requests) instead of timeouts and retries. This then allows you some level of access -- such as to your health and diagnostics end-points, because CPU usage doesn't become so high that you can't even run those.
	▲	jiggawatts 4 days ago \| parent \| prev [-]
		Most systems cap the size of their thread pool and put excess requests into a queue.

▲

jiggawatts 4 days ago | parent | prev [-]

Good summary of the theory, but the weird thing is that every time I’ve rewritten code to use async the total throughout went down by about 10%… which is what I estimate is the overheads introduced by the compiler-generated async state machinery.

I’m yet to see a convincing set of A/B comparisons from a modern language. My experiences don’t line up with the conventional wisdom!

	▲	cryptonector 3 days ago \| parent [-]
		That could be because you're still smearing state on the stack? With async functions one can do that, and so you still have stacks/fibers/threads, and so you've not gained much. With a CPS approach you really don't have multiple stacks. Oh, and relatedly the functional core, imperative shell (FCIS) concept comes in here. The imperative shell is the async I/O event loop / executor. Everything else is functional state transitions that possibly request I/O, and if you represent I/O requests as return values to be executed by the executor, then you can have those state transitions be functional. The functional state transition can use as much stack as it wants, but when it's done the stack is gone -- no stack use between state transitions. Now naturally you don't want state transitions to have unbounded CPU time, but for some applications it might have to be the case that you have to allow it, in which case you have problems (gaaah, thread cancellation is such a pain!). The point of FCIS is to make it so it's trivial to test the state transitions because there is nothing to mock except one input, one state of the world, and check the output against what's expected. The "imperative shell" can also be tested with a very simple "application" and setup to show that it works w/o having to test the whole enchilada with complex mockup setups.

▲

dennis_jeeves2 3 days ago | parent | prev [-]

Lol