Remix.run Logo
mbid 8 hours ago

How many systems are there that can't just spawn a thread for each task they have to work on concurrently? This has to be a system that is A) CPU or memory bound (since async doesn't make disk or network IO faster) and B) must work on ~tens of thousands of tasks concurrently, i.e. can't just queue up tasks and work on only a small number concurrently. The only meaningful example I can come up with are load balancers, embedded software and perhaps something like browsers. But e.g. an application server implementing a REST API that needs to talk to a database anyway to answer each request doesn't really qualify, since the database connection and the work the database itself does are likely much more resource intensive than the overhead of a thread.

YZF 5 hours ago | parent | next [-]

Pretty much anything that needs performance and has a lot of relatively light operations is not a candidate for spawning a thread. Context switching and the cost of threads is going to kill performance. A server spawning a thread per request for relatively lightweight request is going to be extremely slow. But sure, if every REST call results in a 10s database query then that's not your bottleneck. A query to a database can be very fast though (due to caches, indices, etc.) so it's not a given that just because you're talking to a database you can just spin up new threads and it'll be fine.

EDIT: Something else to consider is what if your REST calls needs to make 5 queries. Do you serialize them? Now your latency can be worse. Do you launch a thread per query? Now you need to a) synchornize b) take x5 the thread cost. Async patterns or green threads or coroutines enable more efficient overlapping of operations and potentially better concurrency (though a server that handles lots of concurrent requests may already have "enough" concurrency anyways).

layer8 4 hours ago | parent | next [-]

Server applications don’t spawn threads per request, they use thread pools. The extra context switching due to threads waiting for I/O is negligible in practice for most applications. Asynchronous I/O becomes important when the number of simultaneous requests approaches the number of threads you can have on your system. Many applications don’t come close to that in practice.

There’s a benefit in being able to code the handling of a request in synchronous logic. A case has to be made for the particular application that it would cause performance or resource issues, before opting for asynchronous code that adds more complexity.

YZF 3 hours ago | parent | next [-]

Thread pools are another variation on the theme. But if your threads block then your pool saturates and you can't process any more requests. So thread pools still need non-blocking operations to be efficient or you need more threads. If you have thread pools you also need a way of communicating with that pool. Maybe that exists in the framework and you don't worry about it as a developer. If you are managing a pool of threads then there's a fair amount of complexity to deal with.

I totally agree there are applications for which this is overkill and adds complexity. It's just a tool in the toolbox. Video games famously are just a single thread/main loop kind of application.

acdha 3 hours ago | parent | prev [-]

There’s also a really good operational benefit if you have limits like total RAM, database connections, etc. where being able to reason about resource usage is important. I’ve seen multiple async apps struggle with things like that because async makes it harder to reason about when resources are released.

tcfhgj an hour ago | parent [-]

Could you point out the issue here?

Why does async make it harder to reason about when resources are released?

otabdeveloper4 17 minutes ago | parent [-]

Because async usually means you've stopped having "call stack" as a useful abstraction.

otabdeveloper4 18 minutes ago | parent | prev [-]

> Context switching

No such thing. In a preemptive multitasking OS (that's basically all of them today) you will get context switching regardless of what you do. Most modern OS's don't even give you the tools to mess with the scheduler at all; the scheduler knows best.

anonymars 8 hours ago | parent | prev | next [-]

I'm not sure this is correct mental model of what async solves

Async precisely improves disk/network I/O-bound applications because synchronous code has to waste a whole thread sitting around waiting for an I/O response (each with its own stack memory and scheduler overhead), and in something like an application server there will be many incoming requests doing so in parallel. Cancellation is also easier with async

CPU-bound code would not benefit because the CPU is already busy, and async adds overhead

See e.g. https://learn.microsoft.com/en-us/aspnet/web-forms/overview/... and https://learn.microsoft.com/en-us/aspnet/web-forms/overview/...

likeabbas 6 hours ago | parent | next [-]

I have some test code that runs a comparison of Hyper pre-async (aka thread per request) vs async (via Tokio), and the pre-async version is able to process more requests per second in every scenario (I/o, CPU complex tasks, shared memory).

I'll publish my results shortly. I did these as baselines because I'm testing finishing the User Managed Concurrency Groups proposal to the linux kernel which is an extension to provide faster kernel threads (which beat both of them)

otabdeveloper4 14 minutes ago | parent | next [-]

Async only exists because languages like Python and Javascript have global interpreter locks that don't play nice with threads.

Using async for languages like Rust or C++ is cargo cult by people who don't know what the hell they're doing.

[Caveat: there's a use case for async if you're doing embedded development where you don't have threads or call stacks at all.]

iknowstuff 5 hours ago | parent | prev [-]

How many concurrent requests?

likeabbas 4 hours ago | parent [-]

I'll have to check my work computer on Monday. It was 8 cpu virtual machine on a m1 Mac. the UMCG and normal threads were 1024 set on the server, the Tokio version was 2 threads per core. Just from the top of my head - the I/O bound requests topped out around 40k/second for the Tokio version, 60k/second for the normal hyper version, and 80k/second for the UMCG hyper version.

I'm pretty close to being done - I'm hoping to publish the entire GitHub repository with tests for the community to validate by next week.

UMCG is essentially an open source version of Google Fibers, which is their internal extension to the linux core for "light weight" threads. It requires you to build a user space scheduler, but that allows you to create different types of schedulers. I can not remember which scheduler showed ^ results but I have at least 6 different UMCG schedulers I was testing.

So essentially you get the benefits of something like tokio where you can have different types of schedulers optimized for different use cases, but the power of kernel threads which means easy cancellation, easy programming (at least in rust). It's still a linux thread with an entire 8mb(?) stack size, but from my testing it's far faster than what Tokio can provide, without the headache of async/await programming.

mbid 8 hours ago | parent | prev | next [-]

I read this argument ("async is for I/O-bound applications") often, but it makes no sense to me. If your app is I/O bound, how does reducing the work the (already idling!) CPU has to spend on context switching improve the performance of the system?

ndriscoll 7 hours ago | parent | next [-]

IO bound might mean latency but not throughput, so you can up concurrency and add batching, both of which require more concurrent requests in flight to hit your real limit. IO bound might also really mean contention for latches on the database, and different types of requests might hit different tables. Basically, I see people say they're IO bound long before they're at the limit of a single disk, so obviously they are not IO bound. Modern drives are absurdly fast. If everyone were really IO bound, we'd need 1/1000 the hardware we needed 10-15 years ago.

anonymars 7 hours ago | parent | prev | next [-]

It sounds like you're assuming both pieces are running on the same server, which may not be the case (and if you're bottlenecked on the database it probably shouldn't be, because you'd want to move that work off the struggling database server)

Assuming for the sake of argument that they are together, you're still saving stack memory for every thread that isn't created. In fact you could say it allows the CPU to be idle, by spending less time context switching. On top of that, async/await is a perfect fit for OS overlapped I/O mechanisms for similar reasons, namely not requiring a separate blocking thread for every pending I/O (see e.g. https://en.wikipedia.org/wiki/Overlapped_I/O, https://stackoverflow.com/a/5283082)

mbid 7 hours ago | parent [-]

Right, I think the argument should be that transitioning from a synchronous to asynchronous programming model can improve the performance of a previously CPU/Memory-bound system so that it saturates the IO interface.

anonymars 6 hours ago | parent [-]

If the system is CPU-bound doing useful work, that's not the case. Async shines when there are a lot of "tasks" that are not doing useful work, because they are waiting (e.g. on I/O). Waiting threads waste resources. That's what async greatly improves.

charlieflowers 7 hours ago | parent | prev [-]

The simplest example is that you can easily be wasteful in your use of threads. If you just write blocking code, you will block the thread while waiting on io, and threads are a finite resource.

So avoiding that would mean a server can handle more traffic before running into limits based on thread count.

pocksuppet 6 hours ago | parent | prev [-]

Inversion of thought pattern: Why is a thread such a waste that we can't have one per concurrent request? Make threads less wasteful instead. Go took things in this direction.

anonymars 5 hours ago | parent [-]

How do you suggest we just "make threads less wasteful"?

I mean, I suppose we could move the scheduling and tracking out of kernel mode and into user mode...

But then guess what we've just reinvented?

ozgrakkurt 8 hours ago | parent | prev | next [-]

Async does make nvme io faster because queueing multiple operations on the nvme itself is faster.

default-kramer 6 hours ago | parent | prev [-]

I think it's another case of the whole industry being driven by the needs of the very small number of systems that need to handle >10k concurrent requests.

cmrdporcupine 3 hours ago | parent [-]

Or biases inherited from deploying on single or dual core 32-bit systems from 20 years ago.

Honestly, it's a mostly obsolete approach. OS threads are fast. We have lots of cores. The cost of bouncing around on the same core and losing L1 cache coherence is higher than the cost of firing up a new OS thread that could land on a new core.

The kernel scheduler gets tuned. Language specific async runtimes are unlikely to see so many eyeballs.