(For context, I've conducted hundreds of system design interviews and trained a dozen other people on how to do them at my company. Other interviewers may do things differently or care about other things, but I think what I'm saying here isn't too far off normal.)

I think three things about what you're saying:

1. The answers you're giving don't provide a lot of signal (the queue one being the exception). The question that's implicitly being asked is not just what you would choose, but why you would choose it. What factors would drive you to a particular decision? What are you thinking about when you provide an answer? You're not really verbalizing your considerations here.

A good interviewer will pry at you to get the signal they need to make a decision. So if you say that back-pressure isn't worth worrying about here, they'll ask you when it would be, and what you'd do in that situation. But not all interviewers are good interviewers, and sometimes they'll just say "I wasn't able to get much information out of the candidate" and the absence of a yes is a no. As an interviewee, you want to make the interviewer's job easy, not hard.

2. Even if the interviewer is good and does pry the information out of you, they're probably going to write down something like "the candidate was able to explain sensibly why they'd choose a particular technology, but it took a lot of prodding and prying to get the information out of them -- communications are a negative." As an interviewee, you want to communicate all the information your interviewer is looking for proactively, not grudgingly and reluctantly. (This is also true when you're not interviewing.)

3. I pretty much just disagree on that SQL/NoSQL answer. Team expertise is one factor, but those technologies have significant differences; depending on what you need to do, one of them might be way better than the other for a particular scenario. Your answer there is just going to get dinged for indicating that you don't have experience in enough scenarios to recognize this.

▲

philjohn 5 days ago | parent | next [-]

+1 on the signal. A great candidate won't need you to pry further than asking about backpressure, they'll explain WHY it's not necessary for the qps, what qps it would start becoming necessary, and how they would build it into their design down the line if the service takes off.

One of the things I tell people preparing for system design interviews is the more senior you are, the more you need to drive the interview yourself, knowing when to go deep, what to go deep on, and how to give the most signal to the interviewer.

▲

cryptonector 5 days ago | parent [-]

As the interviewee I'd use such a question to demonstrate my knowledge of the topic. For this question I'd point out that thread-per-CPU w/ async I/O designs w/ maximum live connections and clever connection pool acceptance & eviction policies, and limited buffering, together intrinsically limit oversubscription, but I would still talk extensively about health monitoring and external circuit breaking, as well as the use of 429/whatever and/or flow control (where available) at the protocol level to express backpressure. I would then use this to harp on the evils of thread-per-client designs, and also why they happen, as well as the various alternatives.

Make the interviewer tell you when they've had enough and change topics.

▲

iamcreasy 5 days ago | parent | next [-]

I beg you to write an article expanding on this points. I'll pay to read this article.

▲

cryptonector 5 days ago | parent [-]

Really!

I've written about this in my comments here...

Here's a brief summary:

- typical thread-per-client programming is terribly wasteful because it needs large stacks that must be able to grow (within reason), and this leads programmers to smear client state all over the stack, which then means that the memory and _cache_ footprint of per-client state is huge even though the state is highly compressible, and this is what reduces efficiency when you want to C10K (serve 10,000 clients), and this is what led to C10K techniques in the 90s

- you can most highly compress said per-client program state by using continuation passing style (CPS) async I/O programming, either hand-coded or using modern async functions in languages that have them -- this approach tends to incentivize the programmer to compress program state into a much smaller structure than an execution stack, which therefore greatly reduces the per-client memory and _cache_ footprint of the program

Note that reducing the memory and cache footprint of a program also reduces its memory bandwidth footprint, and increases cache locality and reduces latency. This means you can server more clients with the same hardware. THIS is the point of C10K techniques.

All other techniques like fibers and green threads sit on the spectrum from hand-coded CPS async I/O programming to thread-per-client sequential programming. You get to pick how efficient your code is going to be.

Now, when you apply C10K techniques you get to have one thread-per-CPU -- look ma'! no context switches -- which also improves latency and efficiency, naturally. But there's another neat thing to thread-per-CPU: it becomes easier to manage the overall health of the service, at least per-CPU, because you can now manage all your connected clients, and so you can have eviction policies for connections, and admittance policies too. In particular you can set a maximum number of connections, which means that you can set maxima that only slightly oversubscribe the hardware's capabilities.

Otherwise [and even if you do the thread-per-CPU thing, though it's less important to have circuit breakers if you do] you must have some way to measure the health of your service, and you need to monitor it, and you need your monitor to be able to "break the circuit" by telling your service to start rejecting new work. This is where HTTP status 429 and similar come into play -- it's just a way to express backpressure to clients, though flow control will also do, if you have that available to exercise. You'll still need to be able to monitor load, latencies, and throughput for thread-per-CPU services, naturally, so you know when you need to add HW. And of course you'll want to build services you can scale horizontally as much as possible so that adding hardware is easy, though too you need to be able to find and stop pathological clients (dealing with DoS and DDoS almost always requires components external to your services).

Make sure all middleware can respond appropriately to backpressure, including having their own circuit breakers, and you have a pretty resilient system -- one that under extreme pressure will be able to shed load and continue to function to some degree.

You'll need to be able to express client priorities (so that you can keep servicing part of your load) and quotas (so that pathological high-priority clients don't DoS you).

There's much more to say, naturally.

BTW, I checked today and LLMs seem to know all this stuff, and more than that they'll be able to point you to frameworks, blogs, and lots of other docs. That said, if you don't prompt them to tell you about the thread-per-CPU stuff, they won't.

Keep in mind that C10K techniques are expensive in terms of developer time, especially for junior developers.

▲

belZaah 5 days ago | parent | next [-]

There’s also a systems level rationale to this. Without good isolation, you’ll get a feedback loop: threads start to step on each other’s toes. This leads to slower response times. Which, at a given request pressure, leads to more parallel threads. Which slows them down even more. If there’s a brief peak in pressure, that drops the response time below a critical point, such a system will never recover and you’ll get a server furiously computing without an apparent reason only to behave normally after a restart.

	▲	cryptonector 4 days ago \| parent \| next [-]
		Yes, and thus circuit breakers. By sizing offered capacity to some factor of actual capacity you can limit the effects of too much demand to causing backpressure naturally (rejecting requests) instead of timeouts and retries. This then allows you some level of access -- such as to your health and diagnostics end-points, because CPU usage doesn't become so high that you can't even run those.
	▲	jiggawatts 4 days ago \| parent \| prev [-]
		Most systems cap the size of their thread pool and put excess requests into a queue.

▲

jiggawatts 4 days ago | parent | prev [-]

Good summary of the theory, but the weird thing is that every time I’ve rewritten code to use async the total throughout went down by about 10%… which is what I estimate is the overheads introduced by the compiler-generated async state machinery.

I’m yet to see a convincing set of A/B comparisons from a modern language. My experiences don’t line up with the conventional wisdom!

	▲	cryptonector 3 days ago \| parent [-]
		That could be because you're still smearing state on the stack? With async functions one can do that, and so you still have stacks/fibers/threads, and so you've not gained much. With a CPS approach you really don't have multiple stacks. Oh, and relatedly the functional core, imperative shell (FCIS) concept comes in here. The imperative shell is the async I/O event loop / executor. Everything else is functional state transitions that possibly request I/O, and if you represent I/O requests as return values to be executed by the executor, then you can have those state transitions be functional. The functional state transition can use as much stack as it wants, but when it's done the stack is gone -- no stack use between state transitions. Now naturally you don't want state transitions to have unbounded CPU time, but for some applications it might have to be the case that you have to allow it, in which case you have problems (gaaah, thread cancellation is such a pain!). The point of FCIS is to make it so it's trivial to test the state transitions because there is nothing to mock except one input, one state of the world, and check the output against what's expected. The "imperative shell" can also be tested with a very simple "application" and setup to show that it works w/o having to test the whole enchilada with complex mockup setups.

▲

dennis_jeeves2 3 days ago | parent | prev [-]

Lol

▲

Juliate 4 days ago | parent | prev | next [-]

What you're describing is how the interview process _is_ disconnected from the actual needs, and how it's good to literally "play the game" to get in.

But on the other side, that kind of interview process is itself also a signal candidates might take to avoid playing the game, knowing that most (not all, of course) companies probing for the wrong signals during the interview process are indicative of how they do function as a whole.

(been in both types of companies, and in both sides of the table)

▲

elliotto 5 days ago | parent | prev | next [-]

OP has stated that the system design interview exists as a way to performatively answer a set of questions that don't relate to actually being a good system designer. In your response, you have claimed that their proposed truthful answers don't give a lot of 'signal' and that you would prefer candidates to engage with the performative nature of the process. This is true - but is not an argument against the OP's claim, which is that reciting a set of facts about system design in an interview != being a good system designer.

▲

jackblemming 5 days ago | parent | prev [-]

All that “signal” nonsense can be parroted by both an LLM and someone who read “how to pass system interviews”. Yea, great “signal”.

	▲	cryptonector 4 days ago \| parent [-]
		Not really, not in live, oral interviews. Though I once had a case of the person we thought we were hiring and the person we got being different people. The fix for that is to always have one final in-person interview.