This actually one of my many gripes about Rust async and why I consider it a bad addition to the language in the long term. The fundamental problem is that rust async was developed when epoll was dominant (and almost no one in the Rust circles cared about IOCP) and it has heavily influenced the async design (sometimes indirectly through other languages).

Think about it for a second. Why do we not have this problem with "synchronous" syscalls? When you call `read` you also "pass mutable borrow" of the buffer to the kernel, but it maps well into the Rust ownership/borrow model since the syscall blocks execution of the thread and there are no ways to prevent it in user code. With poll-based async model you side-step this issues since you use the same "sync" syscalls, but which are guaranteed to return without blocking.

For a completion-based IO to work properly with the ownership/borrow model we have to guarantee that the task code will not continue execution until it receives a completion event. You simply can not do it with state machines polled in user code. But the threading model fits here perfectly! If we are to replace threads with "green" threads, user Rust code will look indistinguishable from "synchronous" code. And no, the green threads model can work properly on embedded systems as demonstrated by many RTOSes.

There are several ways of how we could've done it without making the async runtime mandatory for all targets (the main reason why green threads were removed from Rust 1.0). My personal favorite is introduction of separate "async" targets.

Unfortunately, the Rust language developers made a bet on the unproved polling stackless model because of the promised efficiency and we are in the process of finding out whether the bet plays of or not.

▲ duped 5 days ago | parent | next [-]

> You simply can not do it with state machines polled in user code

That's not really true. The only guarantees in Rust futures are that they are polled() once and must have their Waker's wake() called before they are polled again. A completion based future submits the request on first poll and calls wake() on completion. That's kind of the interesting design of futures in Rust - they support polling and completion.

The real conundrum is that the futures are not really portable across executors. For io_using for example, the executor's event loop is tightly coupled with submission and completion. And due to instability of a few features (async trait, return impl trait in trait, etc) there is not really a standard way to write executor independent async code (you can, some big crates do, but it's not necessarily trivial).

Combine that with the fact that container runtimes disable io_uring by default and most people are deploying async web servers in Docker containers, it's easy to see why development has stalled.

It's also unfair to mischaracterize design goals and ideas from 2016 with how the ecosystem evolved over the last decade, particularly after futures were stabilized before other language items and major executors became popular. If you look at the RFCs and blog posts back then (eg: https://aturon.github.io/tech/2016/09/07/futures-design/) you can see why readiness was chosen over completion, and how completion can be represented with readiness. He even calls out how naïve completion (callbacks) leads to more allocation on future composition and points to where green threads were abandoned.

▲ newpavlov 5 days ago | parent | next [-]

No, the fundamental problem (in the context of io-uring) is that futures are managed by user code and can be dropped at any time. This often referred as "cancellation safety". Imagine a future has initialized completion-based IO with buffer which is part of the future state. User code can simply drop the future (e.g. if it was part of `select!`) and now we have a huge problem on our hands: the kernel will write into a dropped buffer! In the synchronous context it's equivalent to de-allocating thread stack under foot of the thread which is blocked on a synchronous syscall. You obviously can do it (using safe code) in thread-based code, but it's fine to do in async.

This is why you have to use various hacks when using io-uring based executors with Rust async (like using polling mode or ring-owned buffers and additional data copies). It could be "resolved" on the language level with an additional pile of hacks which would implement async Drop, but, in my opinion, it would only further hurt consistency of the language.

>He even calls out how naïve completion (callbacks) leads to more allocation on future composition and points to where green threads were abandoned.

I already addressed it in the other comment.

▲ vlovich123 5 days ago | parent | next [-]

I really don’t understand this argument. If you force the user to transfer ownership of the buffer into the I/O subsystem, the system can make sure to transfer ownership of the buffer into the async runtime, not leaving it held within the cancellable future and the future returns that buffer which is given back when the completion is received from the kernel. What am I missing?

▲ Inufu 5 days ago | parent | next [-]

Requiring ownership transfer gives up on one of the main selling points of Rust, being able to verify reference lifetime and safety at compile time. If we have to give up on references then a lot of Rusts complexity no longer buys us anything.

▲ vlovich123 5 days ago | parent [-]

I'm not sure what you're trying to say, but the compile-time safety requirement isn't given up. It would look something like:

    self.buffer = io_read(self.buffer)?

This isn't much different than

    io_read(&mut self.buffer)?

since rust doesn't permit simultaneous access when a mutable reference is taken.

	▲	Inufu 4 days ago \| parent [-]
		It means you can for example no longer do things like get multiple disjoint references into the same buffer for parallel reads/writes of independent chunks. Or well you can, using unsafe, Arc and Mutex - but at that point the safety guarantees aren’t much better than what I get in well designed C++. Don’t get me wrong, I still much prefer Rust, but I wish async and references worked together better. Source: I recently wrote a high-throughput RPC library in Rust (saturating > 100 Gbit NICs)

▲ newpavlov 5 days ago | parent | prev [-]

The goal of the async system is to allow users to write synchronous looking code which is executed asynchronously with all associated benefits. "Forcing" users to do stuff like this shows the clear failure to achieve this goal. Additionally, passing ownership like this (instead of passing mutable borrow) arguably goes against the zero-cost principle.

▲

vlovich123 5 days ago | parent [-]

I don’t follow the zero copy argument. You pass in an owned buffer and get an owned buffer back out. There’s no copying happening here. It’s your claim that async is supposed to look like synchronous code but I don’t buy it. I don’t see why that’s a goal. Synchronous is an anachronistic software paradigm for a computer hardware architecture that never really existed (electronics are concurrent and asynchronous by nature) and cause a lot of performance problems trying to make it work that way.

Indeed, one thing I’ve always wondered is if you can submit a read request for a page aligned buffer and have the kernel arrange for data to be written directly into that without any additional copies. That’s probably not possible since there’s routing happening in the kernel and it accumulates everything into sk_buffs.

But maybe it could arrange for the framing part of the packet and the data to be decoupled so that it can just give you a mapping into the data region (maybe instead of you even providing a buffer, it gives you back an address mapped into your space). Not sure if that TLB update might be more expensive than a single copy.

▲

newpavlov 5 days ago | parent | next [-]

You have an inevitable overhead of managing the owned buffer when compared against simply passing mutable borrow to an already existing buffer. Imagine if `io::Read` APIs were constructed as `fn read(&mut self, buf: Vec<u8>) -> io::Resul<Vec<u8>>`.

Parity with synchronous programming is an explicit goal of Rust async declared many times (e.g. see here https://github.com/rust-lang/rust-project-goals/issues/105). I agree with your rant about the illusion of synchronicity, but it does not matter. The synchronous abstraction is immensely useful in practice and less leaky it is, the better.

▲

vlovich123 5 days ago | parent | next [-]

The problem is that "parity" can be interpreted in different ways and you're choosing to interpret it in a way that doesn't seem to be communicated in the issue you referenced. In fact, the common definition of parity is something like "feature parity" meaning that you can do accomplish all the things you did before, even if it's not the same (e.g. MacOS has copy-paste feature parity even though it's a different shortcut or might work slightly differently from other operating systems). It rarely means "drop-in" parity where you don't have to do anything.

To me it's pretty clear that parity in the issue referenced refers to equivalence parity - that is you can accomplish the tasks in some way, not that it's a drop-in replacement. I haven't seen anywhere suggested that async lets you write synchronous code without any changes, nor that integrating completion-style APIs with asynchronous will yield code that looks like synchronous. For one, completion-style APIs are for performance and performance APIs are rarely structured for simplicity but to avoid implicit costs hidden in common leaky (but simpler) abstractions. For another, completion-style APIs in synchronous programming ALSO looks different from epoll/select-like APIs, so I really don't understand the argument you're trying to make.

EDIT:

> You have an inevitable overhead of managing the owned buffer when compared against simply passing mutable borrow to an already existing buffer. Imagine if `io::Read` APIs were constructed as `fn read(&mut self, buf: Vec<u8>) -> io::Resul<Vec<u8>>`.

I'm imaging and I don't see a huge problem in terms of the overhead this implies. And you'd probably not necessarily take in a Vec directly but some I/O-specific type since such an API would be for performance.

▲

namibj 5 days ago | parent | prev [-]

The problem is that the ring requires suitably locked memory which won't be subject to swapping, thus inherently forcing a different memory type if you want the extra-low-overhead/extra-scalable operation.

It makes sense to ask the ring wrapper for memory that you can emplace your payload into before submitting the IO if you want to use zero-copy.

	▲	surajrmal 5 days ago \| parent [-]
		newpavlov seems to operate in a theoretical bubble. Until something like he describes is published publicly where we talk more concretely, it's not worth engaging. Requiring special buffers is very typical for any hardware offload with zero copy. It isn't a leaky abstraction. Async rust is well designed and I've not seen anything rival it. If there are problems it's in the libraries built on top like tokio.

▲

namibj 5 days ago | parent | prev [-]

Such reads are in principle supported if you have sufficient hardware offloading of your stream. AFAIK io_uring got an update a while back specifically to make this practical for non-stream reads, where you basically provide a slab allocator region to the ring and get to tell reads to pick a free slot/slab in that region _only when they actually get the data_ instead of you blocking DMA capable memory for as long as the remote takes to send you the data.

▲ duped 5 days ago | parent | prev [-]

That problem exists regardless of whether you want to use stackful coroutines or not. The stack could be freed by user code at anytime. It could also panic and drop buffers upon unwinding.

I wouldn't call async drop a pile of hacks, it's actually something that would be useful in this context.

And that said there's an easy fix: don't use the pointers supplied by the future!

▲

newpavlov 5 days ago | parent [-]

>That problem exists regardless of whether you want to use stackful coroutines or not. The stack could be freed by user code at anytime. It could also panic and drop buffers upon unwinding.

Nope. The problem does not exist in the stackfull model by the virtue of user being unable (in safe code) to drop stack of a stackfull task similarly to how you can not drop stack of a thread. If you want to cancel a stackfull task, you have to send a cancellation signal to it and wait for its completion (i.e. cancellation is fully cooperative). And you can not fundamentally panic while waiting for a completion event, the task code is "frozen" until the signal is received.

>it's actually something that would be useful in this context.

Yes, it's useful to patch a bunch of holes introduced by the Rust async model and only for that. And this is why I call it a bunch of hacks, especially considering the fundamental issues which prevent implementation of async Drop. A properly designed system would've properly worked with the classic Drop.

>And that said there's an easy fix: don't use the pointers supplied by the future!

It's always amusing when Rust async advocates say that. Let met translate: don't use `let mut buf = [0u8; 16]; socket.read_all(&mut buf).await?;`. If you can't see why such arguments are bonkers, we don't have anything left to talk about.

▲

oconnor663 5 days ago | parent | next [-]

> don't use `let mut buf = [0u8; 16]; socket.read_all(&mut buf).await?;`. If you can't see why such arguments are bonkers, we don't have anything left to talk about.

It doesn't seem bonkers to me. I know you already know these details, but spelling it out: If I'm using select/poll/epoll in C to do non-blocking reads of a socket, then yes I can use any old stack buffer to receive the bytes, because those are readiness APIs that only write through my pointer "now or never". But if I'm using IOCP/io_uring, I have to be careful not to use a stack buffer that doesn't outlive the whole IO loop, because those are completion APIs that write through my pointer "later". This isn't just a question of the borrow checker being smart enough to analyze our code; it's a genuine difference in what correct code needs to do in these two different settings. So if async Rust forces us to use heap allocated (or long-lived in some other way) buffers to do IOCP/io_uring reads, is that a failure of the async model, or is that just the nature of systems programming?

▲

newpavlov 5 days ago | parent [-]

>is that a failure of the async model

This, 100%. Being really generous, it can be called a leaky model which is poorly compatible with completion-based APIs.

▲

vlovich123 5 days ago | parent [-]

The leaky model is that you could ever receive into a stack buffer and you're arguing to persist this model. The reason it's leaky is that copying memory around is supremely expensive. But that's how the BSD socket API from the 90s works and btw something you can make work with async provided you're into memory copies. io_uring is a modern API that's for performance and that's why Rust libraries try to avoid memory copying within the internals. Supporting copying into the stack buffer with io_uring is very difficult to accomplish even in synchronous code. It's not a failure of async but a different programming paradigm altogether.

As someone else mentioned, what you really want is to ask io_uring to allocate the pages itself so that for reads it gives you pages that were allocated by the kernel to be filled directly by HW and then mapped into your userspace process without any copying by the kernel or any other SW layer involved.

▲

zbentley 4 days ago | parent | next [-]

> what you really want is to ask io_uring to allocate the pages itself so that for reads it gives you pages that were allocated by the kernel

Okay, but what about writes? If I have a memory region that I want io_uring to write, it's a major pain in the ass to manage the lifetime of objects in that region in a safe way. My choices are basically: manually manage the lifetime and only allow it to be dropped when I see a completion show up (this is what most everything does now, and it's a) hard to get right and b) limited in many ways, e.g. it's heap-only), or permanently leak that memory as unusable.

▲

vlovich123 4 days ago | parent [-]

You ask the I/O system for a writable buffer. When you fill it up, you hand it off. Once the I/o finishes, it goes back into the available pool of memory to write with. This is how high performance I/O works.

	▲	zbentley 3 days ago \| parent [-]
		Okay, but . . . how would that work? A syscall gives back a pointer (I thought the point was to avoid syscalls/context switches)? An io_malloc userspace function (great, now how do I manage lifetimes of the buffers it hands out)? Something else?

▲

Asmod4n 5 days ago | parent | prev [-]

there is just one catch.

Using the feature to let io_uring handle buffers for you limits you to the mem lock limit of a process, which is 8MB on a typical debian install (more on others) And that's a hard limit unless you got root access to said machine.

	▲	vlovich123 5 days ago \| parent [-]
		Sure, that's the most efficient way. But you can still have the user allocate a read buffer, pass it to the read API & receive it on the way out. In fact, unlike what OP claimed, this is actually more efficient since you could safely avoid unnecessarily initializing this buffer safely (by truncating to the length read before returning) whereas safely using uninitialized buffers is kind of tricky.

▲

duped 5 days ago | parent | prev [-]

> The problem does not exist in the stackfull model by the virtue of user being unable (in safe code) to drop stack of a stackfull task similarly to how you can not drop stack of a thread.

If you're not doing things better than threads then why don't you just use threads?

> And you can not fundamentally panic while waiting for a completion event, the task code is "frozen" until the signal is received.

So you only allow join/select at the task level? Sounds awful!

> Let met translate: don't use `let mut buf = [0u8; 16]; socket.read_all(&mut buf).await?;

Yes, exactly. It's more like `let buf = socket.read(16);`

	▲	newpavlov 5 days ago \| parent [-]
		>If you're not doing things better than threads then why don't you just use threads? Because green threads are more efficient than the classical threads. You have less context switching, more control over concurrency (e.g. you can have application-level pseudo critical section and tools like `join!`/`select!`), and with io-uring you have a much smaller number of syscalls. In other words, memory footprint would be similar to the classical threads, but runtime performance can be much higher. >So you only allow join/select at the task level? Sounds awful! What is the difference with join/select at the future level? Yes, with the most straightforward implementation you have to allocate full stack for each sub-task (somewhat equivalent to boxing sub-futures). But it's theoretically possible to use the parent task stack for sub-task stacks with the aforementioned compiler improvements. Another difference is that instead of just dropping the future state on the floor you have to explicitly send a cancellation signal (e.g. based on `IORING_OP_ASYNC_CANCEL`) and wait for the sub-task to finish. Performance-wise it should have minimal difference when compared against the hypothetical async Drop. >Yes, exactly. Ok, I have nothing more to add then.

▲ hobofan 5 days ago | parent | prev | next [-]

> The only guarantees in Rust futures are that they are polled() once and must have their Waker's wake() called before they are polled again.

I just had to double-check as this sounded strange to me, and no that's not true.

The most efficient design is to do it that way, yes, but there are no guarantees of that sort. If one wants to build a less efficient executor, it's perfectly permissible to just poll futures on a tight loop without involving the Waker at all.

	▲	duped 5 days ago \| parent [-]
		Let me rephrase, there's no guarantee that a poll() is called again (because of cancel safety) and in practice you have to call wake() because executors won't reschedule the task unless one of their children wake()s

▲ 0x457 5 days ago | parent | prev [-]

> And due to instability of a few features (async trait, return impl trait in trait, etc) there is not really a standard way to write executor independent async code (you can, some big crates do, but it's not necessarily trivial).

Uhm all of that is just sugar on top of stable feature. None of these features or lack off prevent portability.

Full portability isn't possible specifically due to how Waker works (i.e. is implementation specific). That allows async to work with different style of asyncs. Reason why io_uring is hard in rust is because of io_uring way of dealing with memory.

▲ kibwen 5 days ago | parent | prev | next [-]

> The fundamental problem is that rust async was developed when epoll was dominant (and almost no one in the Rust circles cared about IOCP)

No, this is a mistaken retelling of history. The Rust developers were not ignorant of IOCP, nor were they zealous about any specific async model. They went looking for a model that fit with Rust's ethos, and completion didn't fit. Aaron Turon has an illuminating post from 2016 explaining their reasoning: https://aturon.github.io/tech/2016/09/07/futures-design/

See the section "Defining futures":

There’s a very standard way to describe futures, which we found in every existing futures implementation we inspected: as a function that subscribes a callback for notification that the future is complete.

Note: In the async I/O world, this kind of interface is sometimes referred to as completion-based, because events are signaled on completion of operations; Windows’s IOCP is based on this model.

[...] Unfortunately, this approach nevertheless forces allocation at almost every point of future composition, and often imposes dynamic dispatch, despite our best efforts to avoid such overhead.

[...] TL;DR, we were unable to make the “standard” future abstraction provide zero-cost composition of futures, and we know of no “standard” implementation that does so.

[...] After much soul-searching, we arrived at a new “demand-driven” definition of futures.

I'm not sure where this meme came from where people seem to think that the Rust devs rejected a completion-based scheme because of some emotional affinity for epoll. They spent a long time thinking about the problem, and came up with a solution that worked best for Rust's goals. The existence of a usable io_uring in 2016 wouldn't have changed the fundamental calculus.

▲

newpavlov 5 days ago | parent | next [-]

>which we found in every existing futures implementation we inspected

This is exactly what I meant when I wrote about the indirect influence from other languages. People may dress it up as much as they want, but it's clear that polling was the most important model at the time (outside of the Windows world) and a lot of design consideration was put into being compatible with it. The Rust async model literally uses the polling terminology in its most fundamental interfaces!

>this approach nevertheless forces allocation at almost every point of future composition

This is only true in the narrow world of modeling async execution with futures. Do you see heap allocations in Go on each equivalent of "future composition" (i.e. every function call)? No, you do not. With the stackfull models you allocate a full stack for your task and you model function calls as plain function calls without any future composition shenaniganry.

Yes, the stackless model is more efficient memory-wise and allows for some additional useful tricks (like sharing future stacks in `join!`). But the stackfull model is perfectly efficient for 95+% of use cases, fits better with the borrow/ownership model, does not result in the `.await` noise, does not lead to the horrible ecosystem split (including split between different executors), and does not need the language-breaking hacks like `Pin` (see the `noalias` exception made for it). And I believe it's possible to close the memory efficiency gap between the models with certain compiler improvements (tracking maximum stack usage bound for functions and introducing a separate async ABI with two separate stacks).

>The existence of a usable io_uring in 2016 wouldn't have changed the fundamental calculus.

IIRC the first usable versions of io-uring very released approximately during the time when the Rust async was undergoing stabilization. I am really confident that if the async system was designed today we would've had a totally different model. Importance of completion-based models has only grown since then not only because of the sane async file IO, but also because of Spectre and Meltdown.

▲

kibwen 5 days ago | parent [-]

> But the stackfull model is

The existence of advantages doesn't change anything here. The problems is that the disadvantages made this approach a non-starter, despite a lot of effort to make it work. Tradeoffs exist in language design, and the approaches were judged accordingly. What works for Go doesn't necessarily work for Rust, because they target different domains.

> I am really confident that if the async system was designed today we would've had a totally different model

No, without solving the original problems, the outcome would be the same. The Rust devs at the time were well aware of io_uring.

▲

no_wizard 5 days ago | parent [-]

What were the original problems exactly? From what I recall they effectively boiled down to size concerns due to seeing themselves as a c/c++ successor and they didn’t want to lose any adoption in the embedded systems target audience.

▲

kibwen 5 days ago | parent | next [-]

Have you read the article by Aaron Turon linked above? It's very informative, and if you have any questions about specific parts of it, feel free to reference them. In particular it boils down to the fact that Rust bends over backwards to avoid putting anything that requires allocation or dynamic dispatch in the core language (e.g. Rust's closures are fascinating in that they're stack-allocated, like C++'s, while also playing nicely with the borrow checker, which is quite a feat). This property extends to the current design of async, which makes async suitable for embedded devices, which is extremely cool (check out the Embassy project for the state of the art in this space).

▲

const_cast 5 days ago | parent | prev [-]

I mean from an outsiders perspective on Rust this is how I saw it.

Rust is in a strange place because they're a systems language directly competing with C++. Async, in general, doesn't vibe with that but green threads definitely don't.

If you're gonna do green threads you might as well throw in a GC too and get a whole runtime. And now you're writing Go.

▲

no_wizard 5 days ago | parent | next [-]

I don't think doing green threads equates to 'well might as well have a GC now!'. I think they made the wrong tradeoff too, because hardware will inevitably catch up to the language requirements, especially if its desirable to use. Not to mention over time things can be made more efficient from the Rust side as well, with compiler improvements, better programming techniques etc.

I think they made the wrong bet, personally. Having worked in enough languages that have function coloring problems I would avoid it as a language design as a line in the sand item, regardless of tradeoffs

▲

surajrmal 5 days ago | parent | next [-]

There are other languages with green threads and folks are free to use those. Zig is trying to do interesting things with stackful coroutines.

I don't think I nor most systems programmers would have chosen rust if it required green threads instead of stackless coroutines for async. If you work on embedded or low level environments like kernels and whatnot, you need something that falls back to callbacks for async. I'm sure folks who work on servers would have been fine with green threads but they were not the target audience for rust. Being upset because you fall outside the target demographic of a particular language doesn't mean they made the wrong choice. It just means you should look for something else.

▲

fpoling 4 days ago | parent | prev [-]

Hardware does not catches up with language requirements. If anything, it is languages/compilers that catch up with hardware, like SSE instructions and loop parallel ism.

For me the mistake that Rust made was that it tried too hard to behave like C/C++ with its single execution stack.

Ada uses two stacks allowing a callee to return a stack-allocated arrays to the caller. Not only it allows to avoid dynamic allocations in many cases where C++ allocates memory, but it also reduces the need for pointers making the code safer even without the borrow checker.

If instead of async Rust spent efforts on implementing something like that or even allow for explicit stack control from safe code so green threads or co-routines could be implemented as a library it could be more compatible with io_uring world.

	▲	zozbot234 4 days ago \| parent [-]
		> Ada uses two stacks allowing a callee to return a stack-allocated arrays to the caller. You could do this manually by threading a pointer to a separately-allocated stack (could be on the heap or perhaps just a static allocation) as an extra function parameter. It's just a very simple case of arena allocation, with similar advantages and disadvantages. (For example, the caller must ensure that enough space is available on the dynamic-data stack for anything that the callee might want to push onto it.) In general it's just not really worth it, because it turns out that dynamically-sized data that one would not want to simply place on the heap is rare anyway.

▲

zozbot234 5 days ago | parent | prev [-]

On the contrary, stackless async can "vibe" quite well with deep embedded workloads that also require a low-level language like C/C++. There's very few meaningful alternatives to Rust in that space.

▲

fpoling 4 days ago | parent | prev [-]

That post explicitly stated one of the goals was to avoid requiring heap allocations. But fundamentally io_uring is incompatible with the stack and in practice coding against it requires dynamic allocations. If that would be known 10 years ago, surely it would have influenced the design goals.

▲ withoutboats3 5 days ago | parent | prev [-]

genuinely so sad to me that you are still grinding this axe. if your fantasy design works so much better - go build it then!

▲

newpavlov 5 days ago | parent [-]

Deal with it. Async is my greatest disappointment in the otherwise mostly stellar language. And I will continue to argue strongly against it.

After Rust has raised the level of quality and expectations to such great level, async feels like 3 steps back with all those arguments "you are holding it wrong", footguns, and piles of hacks. And this sentiment is shared by many others. It's really disappointing to see how many resources are getting sunk into the flawed async model by both the language and the ecosystem developers.

>go build it then

I did build it and it's in the process of being adopted into a proprietary database (theoretically a prime use-case for async Rust). Sadly, because I don't have ways to change the language and the compiler, it has obvious limitations (and generally it can be called unsound, especially around thread locals). It works for our project only because we have a tightly controlled code base. In future I plan to create a custom "green-thread" fork of `std` to ease limitations a bit. Because of the limitations (and the proprietary nature of the project) it is unlikely to be published as an open source project.

Amusingly, during online discussions I've seen other unrelated people who done similar stuff.

▲

mbStavola 5 days ago | parent | next [-]

> it has obvious limitations (and generally it can be called unsound, especially around thread locals)

Is this really better than what we have now? I don't think async is perfect, but I can see what tradeoffs they are currently making and how they plan to address most if not all of them. "General" unsoundness seems like a rather large downside.

> In future I plan to create a custom "green-thread" fork of `std` to ease limitations a bit

Can you go more in-depth into these limitations and which would be alleviated by having first class support for your approach in the compiler/std?

▲

newpavlov 5 days ago | parent [-]

>Is this really better than what we have now?

Depends on the metric you use. Memory-wise it's a bit less efficient (our tasks usually are quite big, so relative overhead is small in our case), runtime-wise it should be on par or slightly ahead. From the source code perspective, in my opinion, it's much better. We don't have the async/await noise everywhere and after development of the `std` fork we will get async in most dependencies as well for "free" (we still would need to inspect the code to see that they do not use blocking `libc` calls for example). I always found it amusing that people use "sync" `log`-based logging in their async projects, we will not have this problem. The approach also allows migration of tasks across cores even if you keep `Rc` across yield points. And of course we do not need to duplicate traits with their async counterparts and Drop implementations with async operations work properly out of the box.

>Can you go more in-depth into these limitations and which would be alleviated by having first class support for your approach in the compiler/std?

The most obvious example is thread locals. Right now we have to ensure that code does not wait on completion while having a thread local reference (we allow migration of tasks across workers/cores by default). We ban use of thread locals in our code and assume that dependencies are unable to yield into our executor. With forked `std` we can replace the `thread_local!` macro with a task-local implementation which would resolve this issue.

Another source of potential unsoundness is reuse of parent task stack for sub-task stacks in our implementation of `select!`/`join!` (we have separate variants which allocate full stacks for sub-tasks which are used for "fat" sub-tasks). Right now we have to provide stack size for sub-tasks manually and check that the value is correct using external tools (we use raw syscalls for interacting with io-uring and forbid external shared library calls inside sub-tasks). This could be resolved with the aforementioned special async ABI and tracking of maximum stack usage bound.

Finally, our implementation may not work out-of-box on Windows (I read that it has protections against messing with stack pointer on which we rely), but it's not a problem for us since we target only modern Linux.

	▲	surajrmal 5 days ago \| parent [-]
		If you use a custom libc and dynamic linker, you can very easily customize thread locals to work the way you want without forking the standard library.

▲

mikevm 5 days ago | parent | prev [-]

[dead]