Looks really cool and I'm going to take a closer look tonight!

How do you do the context switching between coroutines? getcontext/setcontext, or something more architecture specific? I'm currently working on some stackful coroutine stuff and the swapcontext calls actually take a fair amount of time, so I'm planning on writing a custom one that doesn't preserve unused bits (signal mask and FPU state). So I'm curious about your findings there

▲

ambonvik 5 hours ago | parent [-]

Hi, it is hand-coded assembly. Pushing all necessary registers to the stack (including GS on Windows), swapping the stack pointer to/from memory, popping the registers, and off we go on the other stack. I save FPU flags, but not more FPU state than necessary (which again is a whole lot more on Windows than on Linux).

Others have done this elsewhere, of course. There are links/references to several other examples in the code. I mention two in particular in the NOTICE file, not because I copied their code, but because I read it very closely and followed the outline of their examples. It would probably taken me forever to figure out the Windows TIB on my own.

What I think is pretty cool (biased as I am) in my implementation is the «trampoline» that launches the coroutine function and waits silently in case it returns. If it does, it is intercepted and the proper coroutine exit() function gets called.

▲

anematode 5 hours ago | parent [-]

Interesting. How does the trampoline work?

I'm wondering whether we could further decrease the overhead of the switch on GCC/clang by marking the push function with `__attribute__((preserve_none))`. Then among GPRs we only need to save the base and stack pointers, and the callers will only save what they need to

▲

ambonvik 4 hours ago | parent [-]

It is an assembly function that does not get called from anywhere. I pre-load the stack image with its intended register content from C, including the trampoline function address as the "return address". On the first transfer to the newly created coroutine, that gets loaded, which in turn calls the coroutine function that suddenly is in one of its registers along with its arguments. If the coroutine function ever returns, that just continues the trampoline function, which proceeds to call the coroutine_exit() function, whose address also just happens to be stored in another handy register.

https://github.com/ambonvik/cimba/blob/main/src/port/x86-64/...

▲

anematode 4 hours ago | parent [-]

Ahhh ok. Cool!

Do sanitizers (ASan/UBSan/valgrind) still work in this setting? Also I'm wondering if you'll need some special handling if Intel CET is enabled

▲

ambonvik 4 hours ago | parent [-]

They probably do, but I have not used them. My approach has been "offensive programming", to put in asserts for preconditions, invariants, and postconditions wherever possible. If anything starts to smell, I'd like to stop it cold and fix it rather than trying to figure it out later. With two levels of concurrency in a shared memory space it could get hairy fast if bugs were allowed to propagate to elsewhere before crashing something.

Not familiar with the details of Intel CET, but this is basically just implementing what others call fibers or "green threads", so any such special handling should certainly be possible if necessary.

	▲	anematode 4 hours ago \| parent [-]
		Cool. I have faith in thorough testing for the coroutines' correctness, but sanitizers would be convenient for people debugging their own code that leverages this library. I know that ASan doesn't support getcontext et al., but maybe this is different