▲ | AndyKelley 2 days ago | |||||||
No, I hadn't read the linked thread until you prodded me. Now I have and I understand the situation entirely. I'll give a brief overview; feel free to ask any followup questions. A straightforward implementation of memchr, i.e. finding the index of a particular byte inside an array of bytes, looks like this:
This is trivial to lower to well-defined LLVM IR.But it's desirable to use tricks to make the function really fast, such as assuming that you can read up to the page boundary with SIMD instructions[1]. This is generally true on real world hardware, but this is incompatible with the pointer provenance memory model, which is load-bearing for important optimizations that C, C++, Rust, and Zig all rely on. So if you want to do such tricks you have to do it in a black box that is exempt from the memory model rules. The Zig code I link to here is unsound because it does not do this. An optimization pass, whether it be implemented in Zig pipeline or LLVM pipeline, would be able to prove that it writes outside a pointer provenance, mark that particular control flow unreachable, and thereby cause undefined behavior if it happens. This is not really LLVM's fault. This is a language shortcoming in C, C++, Rust, Zig, and probably many others. It's a fundamental conflict between the utility of pointer provenance rules, and the utility of ignoring that crap and just doing what you know the machine allows you to do. [1]: https://github.com/ziglang/zig/blob/0.14.1/lib/std/mem.zig#L... | ||||||||
▲ | ncruces 2 days ago | parent [-] | |||||||
Thanks for taking the time! I was the original contributor of the SIMD code, and got this… pushback. I still don't quite understand how you can marry ”pointer provenance” with the intent that converting between pointers and integers is “to be consistent with the addressing structure of the execution environment” and want to allow DMA in your language, but then this is UB. But well, a workable version of it got submitted, I've made subsequent contributions (memchr, strchr, str[c]spn…), all good. Just makes me salty on C, as if I needed more reasons to. | ||||||||
|