| ▲ | hayd 2 days ago |
| Is this something likely to ever change? |
|
| ▲ | topspin 2 days ago | parent | next [-] |
| I believe it's possible, but that it's a hard problem requiring great effort. I believe this is a opportunity to apply formal methods ah la seL4, that nothing less will be sufficient, and that the value of io_uring is great enough to justify it. That will take a lot of talent and hours. I admire io_uring. I appreciate the fact that it exists and continues despite the security problems; evidence that security "concerns" don't (yet) have a veto over all things Linux. The design isn't novel. High performance hardware (NICs, HBAs, codecs, etc.) have used similar techniques for a long time. Io_uring only brings this to user space and generalizes it. I imagine an OS and hardware that fully inculcate the pattern, obviating the need for context switches, interrupts, blocking and other conventional approaches we've slouched into since the inception of computing. |
| |
| ▲ | quotemstr 2 days ago | parent [-] | | Alternatively, it requires cloud providers and such losing business if they refuse to support the latest features. The "surface area" argument against io_uring can apply to literally any innovation. Over on LWN, there's an article on path traversal difficulties that mentions people how, because openat2(2) is often banned as inconvenient to whitelist using seccomp, eople have to work around path traversal bugs using fiddly, manual, and slow element-by-element path traversal in user space. Ridiculous security theater. A new system call had a vulnerability in 2010 and so we're never able to take practical advantage of new kernel features ever? (It doesn't help that gvisor refuses to acknowledge the modern world.) Great example of descending into a shitty equilibrium because the great costs of a bad policy are diffuse but the slight benefits are concentrated. The only effective lever is commercial pressure. All the formal methods in the world won't help when the incentive structure reinforces technical obstinacy. |
|
|
| ▲ | charcircuit 2 days ago | parent | prev | next [-] |
| It already did with the io_uring worker rewrite in 5.12 (2021) which made it much safer. https://github.com/axboe/liburing/discussions/1047 |
| |
| ▲ | topspin a day ago | parent [-] | | I can't agree with this. There is ample evidence of serious flaws since 2021. I hate that. I wish it weren't true. But an objective analysis of the record demands that view. Here is a fun one from September (CVE-2025-39816): "io_uring/kbuf: always use READ_ONCE() to read ring provided buffer lengths." That is an attackers wet dream right there: bump the length and exfiltrate sensitive data. And it wasn't just some short lived "Linus's branch" work no one actually ran: it existed for a time in, for example, Ubuntu 24.04 LTS (circa 2024 release date.) I just cherry picked that one from among many. |
|
|
| ▲ | Asmod4n 2 days ago | parent | prev [-] |
| It’s manageable with eBPF instead of seccomp so one has to adapt to that. Should be doable. |
| |
| ▲ | georgyo 2 days ago | parent [-] | | Maybe not so doable. The whole point of io_uring is to reduce syscalls. So you end up just three. io_uring_setup, io_uring_register, io_uring_enter There is now a memory buffer that the user space and the kernel is reading, and with that buffer you can _always_ do any syscall that io_uring supports. And things like strace, eBPF, and seccomp cannot see the actual syscalls that are being called in that memory buffer. And, having something like seccomp or eBPF inspect the stream might slow it down enough to eat the performance gain. | | |
| ▲ | to_ziegler 2 days ago | parent | next [-] | | There is some interesting ongoing research on eBPF and uring that you might find interesting, e.g., RingGuard: Guarding io_uring with eBPF (https://dl.acm.org/doi/10.1145/3609021.3609304
). | |
| ▲ | Asmod4n 2 days ago | parent | prev | next [-] | | Ain’t eBPF hooks there so you can limit what a cgroup/process can do, not matter what API it’s calling. Like disallowing opening files or connecting sockets altogether. | |
| ▲ | actionfromafar 2 days ago | parent | prev [-] | | So io_uring is like transactions in sql but for syscalls? | | |
| ▲ | topspin a day ago | parent [-] | | No. A batch of submission queue entries (SQEs) can be partially completed, whereas an ACID database transaction is all or nothing. The syscalls performed by SQEs have side effects that can't reasonably be undone. Failures of operations performed by SQEs don't stop or rollback anything. Think of io_uring as a pair of unidirectional pipes. You shove syscalls and (pointers to) data into one pipe and the results (asynchronously) gush out of the other pipe, errors and all. Each pipe is actually a separate block of memory shared between your process and the kernel: you scribble in one and read from the other, and the kernel does the opposite. |
|
|
|