Threads are definitely not _the_ answer but _an_ answer.

You can have as many threads as hardware threads, but in each thread you want continuation passing style (CPS) or async-await (which is a lot like syntactic sugar for CPS). Why? Because threads let you smear program state over a large stack, increasing memory footprint, while CPS / async-await forces you to make all the state explicit and compressed, thus optimizing memory footprint. This is not a small thing. If you have thread-per-client services, each thread will need a sizeable stack, each stack with a guard page -- even with virtual memory that's expensive, both to set up and in terms of total memory footprint.

Between memory per client, L1/L2 cache footprint per client, page faults (to grow the stack), and context switching overhead, thread-per-client is much more expensive than NPROC threads doing CPS or async-await. If you compress the program state per client you can fit more clients in the same amount of memory, and the overhead of switching from one client to another is lower, thus you can have more clients.

This is the reason that async I/O is the key to solving the "C10K" problem: it forces the programmer to compress per-client program state.

But if you don't need to cater to C10K (or C10M) then thread-per-client is definitely simpler.

So IMO it's really about trade-offs. Does your service need to be C10K? How much are you paying for the hardware/cloud you're running it on? And so on. Being more efficient will be more costly in developer cycles -- that can be very expensive, and that's the reason that research into async-await is ongoing: hopefully it can make C10K dev cheaper.

But remember, rewrites cost even more than doing it right the first time.

▲

bvrmn 8 months ago | parent [-]

> Does your service need to be C10K?

It's incorrect question. The correct one "Do your downstream services could handle C10K?" For example a service with a database should almost never be bothered with C10K problem unless most of the requests could skip db access.

Every time you introduce backpressure handling in C10K-ready app it's a red flag you should simply use threads.

▲

cryptonector 8 months ago | parent [-]

I think you're saying that a database can't be C10K. Why? You don't say but I imagine that you mean because it's I/O bound, not CPU bound. And that may be true, but it may also not be true. Consider an all in-memory database (no paging): it will not be I/O bound.

> Every time you introduce backpressure handling in C10K-ready app it's a red flag you should simply use threads.

That's an admission that threads are slower. I don't see why you wouldn't want ways to express backpressure. You need backpressure for when you have impedance mismatches in performance capabilities; making all parts of your system equally slow instead is not an option.

	▲	bvrmn 8 months ago \| parent [-]
		> I think you're saying that a database can't be C10K. I did not say this. But indeed most relational databases used for most applications can't handle C10K. > That's an admission that threads are slower. It highly depends from async runtime. Even if it's slower I pick thread based system anytime than async/await or channel spaghetti. My experience is largely with python an go where it's quite easy to miss something and get broken app. > I don't see why you wouldn't want ways to express backpressure. It's an additional code, usually quite messy and fragile. It's hard to maintain systems with backpressure handling in each component. Issues lead to higher investigation times. If you want to keep long connections it's almost always more convenient to split system to a small event-based (epoll, kqueue, io_uring) frontend and multiprocess and/or multithread backend. Frontend even could be written with async/await if nginx is not suitable.