Remix.run Logo
Asmod4n 2 days ago

It’s manageable with eBPF instead of seccomp so one has to adapt to that. Should be doable.

georgyo 2 days ago | parent [-]

Maybe not so doable. The whole point of io_uring is to reduce syscalls. So you end up just three. io_uring_setup, io_uring_register, io_uring_enter

There is now a memory buffer that the user space and the kernel is reading, and with that buffer you can _always_ do any syscall that io_uring supports. And things like strace, eBPF, and seccomp cannot see the actual syscalls that are being called in that memory buffer.

And, having something like seccomp or eBPF inspect the stream might slow it down enough to eat the performance gain.

to_ziegler 2 days ago | parent | next [-]

There is some interesting ongoing research on eBPF and uring that you might find interesting, e.g., RingGuard: Guarding io_uring with eBPF (https://dl.acm.org/doi/10.1145/3609021.3609304 ).

Asmod4n 2 days ago | parent | prev | next [-]

Ain’t eBPF hooks there so you can limit what a cgroup/process can do, not matter what API it’s calling. Like disallowing opening files or connecting sockets altogether.

actionfromafar 2 days ago | parent | prev [-]

So io_uring is like transactions in sql but for syscalls?

topspin a day ago | parent [-]

No. A batch of submission queue entries (SQEs) can be partially completed, whereas an ACID database transaction is all or nothing. The syscalls performed by SQEs have side effects that can't reasonably be undone. Failures of operations performed by SQEs don't stop or rollback anything.

Think of io_uring as a pair of unidirectional pipes. You shove syscalls and (pointers to) data into one pipe and the results (asynchronously) gush out of the other pipe, errors and all. Each pipe is actually a separate block of memory shared between your process and the kernel: you scribble in one and read from the other, and the kernel does the opposite.