groundzeros2015 5 hours ago

Async seems like an underbaked idea across the board. Regular code was already async. When you need to wait for an async operation, the thread sleeps until ready and the kernel abstracts it away. But We didn’t like structuring code into logical threads, so we added callback systems for events. Then realized callbacks are very hard to reason about and that sequential control is better.

So threads was the right programming model.

Now language runtimes prefer “green threads” for portability and performance but most languages don’t provide that properly. Instead we have awkward coloring of async/non-async and all these problems around scheduling, priority, and no-preemption. It’s a worse scheduling and process model than 1970.

▲

vlovich123 5 hours ago | parent | next [-]

> Regular code was already async. When you need to wait for an async operation, the thread sleeps until ready and the kernel abstracts it away

Not really. I’ve observed async code often is written in such a way that it doesn’t maximize how much concurrency can be expressed (eg instead of writing “here’s N I/O operations to do them all concurrently” it’s “for operation X, await process(x)”). However, in a threaded world this concurrency problem gets worse because you have no way to optimize towards such concurrency - threads are inherently and inescapably too heavy weight to express concurrency in an efficient way.

This is is not a new lesson - work stealing executors have long been known to offer significantly lower latency with more consistent P99 than traditional threads. This has been known since forever - in the early 00s this is why Apple developed GCD. Threads simply don’t provide any richer information it needs in the scheduler to the kernel about the workload and kernel threads are an insanely heavy mechanism for achieving fine grained concurrency and even worse when this concurrency is I/O or a mixed workload instead of pure compute that’s embarrassingly easily to parallelize.

Do all programs need this level of performance? No, probably not. But it is significantly more trivial to achieve a higher performance bar and in practice achieve a latency and throughput level that traditional approaches can’t match with the same level of effort.

You can tell async is directionally kind of correct in that io_uring is the kernel’s approach to high performance I/O and it looks nothing like traditional threading and syscalls and completion looks a lot closer to async concurrency (although granted exploiting it fully is much harder in an async world because async/await is an insufficient number of colors to express how async tasks interrelate)

▲

groundzeros2015 5 hours ago | parent | next [-]

I am not saying threads are the model for all programming problems. For example a dependency graph like an excel spreadsheet can be analyzed and parallelized.

But as you observed, async/await fails to express concurrency any better. It’s also a thread, it’s just a worse implementation.

▲

vlovich123 5 hours ago | parent [-]

That’s incorrect. Even when expressed suboptimally, it still tends to result in overall higher throughput and consistently lower latency (work stealing executors specifically). And when you’re in this world, you can always do an optimization pass to better express the concurrency. If you’ve not written it async to start with, then you’re boned and have no easy escape hatch to optimize with.

	▲	groundzeros2015 4 hours ago \| parent [-]
		Why can’t you do the same optimization? Are you maxing out you OS system resources on thread overhead?

▲

Hendrikto 3 hours ago | parent | prev [-]

> threads are inherently and inescapably too heavy weight to express concurrency in an efficient way

Your premise is wrong. There are many counterexamples to this.

▲

LelouBil an hour ago | parent [-]

Can you explain more ? I always heard this.

▲

Hendrikto an hour ago | parent [-]

The most promiment example is probably Go with its goroutines, but there are so many more. You can easily spawn tens of thousands of goroutines, with low overhead and great performance.

	▲	igregoryca 30 minutes ago \| parent [-]
		Goroutines/"fibers"/"green threads" are usually scheduled by the runtime system across a small pool of actual OS threads.

▲

nananana9 5 hours ago | parent | prev | next [-]

> the thread sleeps until ready and the kernel abstracts it away.

Sure, but once you involve the kernel and OS scheduler things get 3 to 4 orders of magnitude slower than what they should be.

The last time I was working on our coroutine/scheduling code creating and joining a thread that exited instantly was ~200us, and creating one of our green threads, scheduling it and waiting for it was ~400ns.

You don't need to wait 10 years for someone else to design yet another absurdly complex async framework, you can roll your own green threads/stackful coroutines in any systems language with 20 lines of ASM.

▲

groundzeros2015 5 hours ago | parent [-]

1. Why can’t we have better green threads implementations with better scheduling models?

2. Unchecked array operations are a lot faster. Manual memory management is a lot faster. Shared memory is a lot faster.

Usually when you see someone reach for sharp and less expressive tools it’s justified by a hot code path. But here we jump immediately to the perf hack?

3. How many simultaneous async operations does your program have?

▲

vlovich123 5 hours ago | parent [-]

Well, if you offload heavy compute into an async task, then usually it depends strictly on how many concurrent inputs you are given. But even something as “simple” as a performance editor benefits from this if done well - that’s why JS text editors have reasonably acceptable performance whereas Java IDEs always struggled (historically anyway since even Java has adopted green threads).

▲

ptx 4 hours ago | parent | next [-]

Are you sure Java's UI issues are caused by threading and not just Swing being a glitchy pile of junk?

For example, if you don't explicitly call the java.awt.Toolkit.sync() method after updating the UI state (which according to the docs "is useful for animation"), Swing will in my experience introduce seemingly random delays and UI lag because it just doesn't bother sending the UI updates to the window system.

▲

groundzeros2015 4 hours ago | parent | prev | next [-]

You think IDEs are written in JS because of the performance benefits of the threading model?

I thought it was because they could copy chromium.

	▲	vlovich123 4 hours ago \| parent [-]
		Why do you think they don’t struggle with input latency? Because the non blocking nature built into the browser model is so powerful and you cannot get that with threads.

▲

PunchyHamster 3 hours ago | parent | prev | next [-]

Maybe you remember performance of IDEs from 15 years ago because that definitely isn't my experience.

▲

jcelerier an hour ago | parent | prev [-]

> that’s why JS text editors have reasonably acceptable performance

Absolutely not

▲

BlackFly 4 hours ago | parent | prev | next [-]

I think that callbacks are actually easier to reason about:

When it comes time to test your concurrent processing, to ensure you handle race conditions properly, that is much easier with callbacks because you can control their scheduling. Since each callback represents a discrete unit, you see which events can be reordered. This enables you to more easily consider all the different orderings.

Instead with threads it is easy to just ignore the orderings and not think about this complexity happening in a different thread and when it can influence the current thread. It isn't simpler, it is simplistic. Moreover, you cannot really change the scheduling and test the concurrent scenarios without introducing artificial barriers to stall the threads or stubbing the I/O so you can pass in a mock that you will then instrument with a callback to control the ordering...

The problem with callbacks is that the call stack when captured isn't the logical callstack unless you are in one of the few libraries/runtimes that put in the work to make the call stacks make sense. Otherwise you need good error definitions.

You can of course mix the paradigms and have the worst of both worlds.

	▲	groundzeros2015 4 hours ago \| parent [-]
		I agree. I don’t think callbacks are an underbaked language feature.

▲

usrnm 5 hours ago | parent | prev | next [-]

The problem comes from trying to sit on both chairs: we want async but want to be able to opt out. This is what causes most of the ugliness, including function colouring. Just look at golang, where everything is async with no way to change it, it's great. It's, probably, not well-suited for things like microcontrollers, where every byte matters, but if you can afford the overhead, it's so much better than Rust async. Before async Rust was an interesting and reasonable language, now it's just a hot mess that makes your eyes bleed for no reason.

▲

vanderZwan 4 hours ago | parent | next [-]

> It's, probably, not well-suited for things like microcontrollers, where every byte matters, but if you can afford the overhead, it's so much better than Rust async.

There is one hill I'll die on, as far as programming languages go, which is that more people should study Céu's structured synchronous concurrency model. It specifically was designed to run on microcontrollers: it compiles down to a finite state machine with very little memory overhead (a few bytes per event).

It has some limitations in terms of how its "scheduler" scales when there are many trails activated by the same event, but breaking things up into multiple asynchronous modules would likely alleviate that problem.

I'm certain a language that would suppprt the "Globally Asynchronous, Locally Synchronous" (GALS) paradigm could have their cake and eat it too. Meaning something that combines support for a green threading model of choice for async events, with structured local reactivity a la Céu.

F'Santanna, the creator of Céu, actually has been chipping away at a new programming language called Atmos that does support the GALS paradigm. However, it's a research language that compiles to Lua 5.4. So it won't really compete with the low-level programming languages there.

[0] https://ceu-lang.org/

[1] https://github.com/atmos-lang/atmos

▲

PunchyHamster 3 hours ago | parent | prev | next [-]

Everything is not async in Go.

If your threads are "free" you can just run 400 copies of a synchronous code and blocking in one just frees the thread to work on other. async within same goroutine is still very much opt in (you have to manually create goroutine that writes to channel that you then receive on), it just isn't needed where "spawn a thread for each connecton" costs you barely few kb per connection.

	▲	jeremyjh 2 hours ago \| parent [-]
		What GP meant - what everyone means when they say this - is that goroutines are always M:N threading and so there is no such thing as function coloring. In Rust to get M:N threading you have to use async and in practice every library you use has to use async. Hence function coloring, and two separate ecosystems of libraries in the same language.

▲

baq 4 hours ago | parent | prev [-]

> not well-suited for things like microcontrollers, where every byte matters

except when a RAM fetch is so expensive a load is basically an async call - and it's a single machine code instruction at the same time

▲

pkolaczk 5 hours ago | parent | prev | next [-]

Threads are neither better or worse than async+callbacks. They are different. There are problems which map nicely to threads and there are problems which are much nicer to express with async.

▲

groundzeros2015 5 hours ago | parent [-]

Such as? The entire premise of async is that callbacks were a mistake because they broke sequential reasoning and control.

Every explanation of the feature starts with managing callback hell.

▲

repelsteeltje 4 hours ago | parent | next [-]

Beware, they are different concepts.

Threads offer concurrent execution, async (futures) offer concurrent waiting. Loosely speaking, threads make sense for CPU bound problems, while async makes sense for IO bound problems.

	▲	groundzeros2015 8 minutes ago \| parent [-]
		Why? You write the same code with async await but with a keyword at the beginning of every function.

▲

codedokode 4 hours ago | parent | prev [-]

The callbacks should be just hidden from programmer, that's what async/await are for.

▲

swiftcoder 5 hours ago | parent | prev | next [-]

> So threads was the right programming model.

For problems that aren't overly concerned with performance/memory, yes. You should probably reach for threads as a default, unless you know a priori that your problem is not in this common bucket.

Unfortunately there is quite a lot of bookkeeping overhead in the kernel for threads, and context switches are fairly expensive, so in a number of high performance scenarios we may not be able to afford kernel threading

▲

groundzeros2015 5 hours ago | parent [-]

In that sentence I’m referring to the abstract idea of a thread of execution as a model of programming, not OS threads. A green thread implementation could do it too.

But what you said about kernel implementation is true. But are we really saying that the primary motivation for async/await is performance? How many programmers would give that answer? How many programs are actually hitting that bottleneck?

Doesn’t that buck the trend of every other language development in the past 20 years, emphasizing correctness and expressively over raw performance?

▲

nchie 5 hours ago | parent | next [-]

> But are we really saying that the primary motivation for async/await is performance?

Of course - what else would it be? The whole async trend started because moving away from each http request spawning (or being bound to) an OS thread gave quite extreme improvements in requests/second metrics, didn't it?

▲

groundzeros2015 5 hours ago | parent | next [-]

I agree. Managing many http requests or responses was a motivating problem.

What I question is whether 1. Most programs resemble that, so that they make it an invasive feature of every general purpose language. 2. Whether programmers are making a conscious choice because they ruled out the perf overhead of the simpler model we have by default.

	▲	swiftcoder 5 hours ago \| parent [-]
		That is why we have the function colouring problem and a split ecosystem in the first place - if it were obviously better in all cases, we'd make async the default, and get rid of the split altogether (and there are languages, like Erlang, that fall on this side of the fence)

▲

lukaslalinsky an hour ago | parent | prev [-]

It was not for performance reasons, but for scaling up.

	▲	pjc50 an hour ago \| parent [-]
		That's the same thing?

▲

swiftcoder 5 hours ago | parent | prev | next [-]

> But are we really saying that the primary motivation for async/await is performance?

The original motivation for not using OS threads was indeed performance. Async/await is mostly syntax sugar to fix some of the ergonomic problems of writing continuation-based code (Rust more or less skipped the intermediate "callback hell" with futures that Javascript/Python et al suffered through).

▲

PunchyHamster 3 hours ago | parent [-]

In some languages, yes, in others (js/python) async is just workaround about not having proper threading.

	▲	swiftcoder 2 hours ago \| parent [-]
		Python used multiple threads to handle I/O long before async/await was a glimmer in anyone's mind (despite the GIL). nodejs is one of the very few languages I can think of that was born single-threaded and used an asynchronous runtime from the get-go

▲

sureglymop 4 hours ago | parent | prev [-]

Importantly though, performance might be worse depending on use case and program. Specifically with scheduling in user space it can negatively impact branch prediction as your CPU is already hyper optimized for doing things differently.

It's all nuanced and what to choose requires careful evaluation.

▲

codedokode 4 hours ago | parent | prev | next [-]

As I understand, "green threads" are also expensive, for example you either need to allocate a large stack for each "thread", or hook stack allocation to grow the stack dynamically (like Go does), and if you grow the stack, you might have to move it and cannot have pointers to stack objects.

▲

lukaslalinsky an hour ago | parent | next [-]

Green threads are fine for large servers with memory overcommit. Even with static stack sizes, you get benefits over OS threads due to the simpler scheduling. But the post was about embedded and green threads really suck there. Only using as much stack as you need for the task is the perfect solution for embedded systems.

▲

kgeist 3 hours ago | parent | prev | next [-]

>and if you grow the stack, you might have to move it

Most stacks are tiny and have bounded growth. Really large stacks usually happen with deep recursion, but it's not a very common pattern in non-functional languages (and functional languages have tail call optimization). OS threads allocate megabytes upfront to accommodate the worst case, which is not that common. And a tiny stack is very fast to copy. The larger the stack becomes, the less likely it is to grow further.

>cannot have pointers to stack objects

In Go, pointers that escape from a function force heap allocation, because it's unsafe to refer to the contents of a destroyed stack frame later on in principle. And if we only have pointers that never escape, it's relatively trivial to relocate such pointers during stack copying: just detect that a pointer is within the address range of the stack being relocated and recalculate it based on the new stack's base address.

▲

PunchyHamster 3 hours ago | parent | prev [-]

works fine in Go.

Yes, you're not getting Rust performance (tho good part of it is their own compiler vs using all LLVM goodness) but performance is good enough and benefits for developers are great, having goroutines be so cheap means you don't even need to do anything explicitly async to get what you want

	▲	aw1621107 3 hours ago \| parent [-]
		Rust chose a different design space for their async implementation though, so what works well for Go wouldn't work well for Rust. In particular, the Rust devs wanted zero-cost FFI that external code doesn't need to know about, which precludes Go-like green threads.

▲

dgellow an hour ago | parent | prev | next [-]

You don’t have threads on embedded, but you want a way to express concurrent waiting. Different problems altogether

▲

pjmlp 3 hours ago | parent | prev | next [-]

Proper modern languages offer both, you can keep your threads and reach out to async only when it makes sense to do.

Now the languages that don't offer choice is another matter.

▲

the__alchemist 2 hours ago | parent | prev | next [-]

What is kernel in this context?

▲

hacker_homie 5 hours ago | parent | prev | next [-]

I’m just waiting for them to try co-operative multithreading again.

▲

K0nserv 3 hours ago | parent | prev [-]

I think you are correct, in so far that often N:M threading is overkill for the problem at hand. However, some IO bound problems truly do require it. I haven't kept up with the details, but AFAIK the fallout from Spectre and Meltdown also means context switches are more expensive than they were historically, which is another downside with regular threads.

I also want to address something that I've seen in several sub-threads here: Rust's specific async implementation. The key limitation, compared to the likes of Go and JS, is that Rust attempts to implement async as a zero-cost abstraction, which is a much harder problem than what Go and JS does. Saying some variant of "Rust should just do the same thing as Go", is missing the point.

	▲	groundzeros2015 4 minutes ago \| parent [-]
		I think rust didn’t need async at all.