| ▲ | greysphere 4 hours ago |
| The examples aren't really undefined behavior. They are examples that could become UB based on input/circumstances. Which if you are going to be that generous, every function call is UB because it could exceed stack space. Which is basically true in any language (up to the equivalent def of UB in that language). I feel like c has enough actual rough edges that deserve attention that sensationalism like this muddies folks attention (particularly novices) and can end up doing more harm than good. |
|
| ▲ | guerby 3 hours ago | parent | next [-] |
| Ada 83 has no UB on call stack overflow, from the reference manual : http://archive.adaic.com/standards/83lrm/html/lrm-11-01.html "STORAGE_ERROR This exception is raised in any of the following situations: (...) or during the execution of a subprogram call, if storage is not sufficient." |
| |
| ▲ | veltas 3 hours ago | parent [-] | | So it's just as useful as when your stack area ends with a page that will segfault on access, or your CPU will raise an interrupt if stack pointer goes beyond a particular address? It's not safe though because throwing an exception, panicking, etc, is still a denial of service. It's just more deterministic than silently overwriting the heap instead. If the program is critical then you need to be able to statically prove the full size of the stack, which you can do with C and C++ with the right tools and restrictions. | | |
| ▲ | bregma 23 minutes ago | parent | next [-] | | A segfault is considered safe if you're talking about functional safety because it results in a return to a defined safe state (RTDSS). If a segfault leads to some other state you do not deem "safe", such as a single program gating access to a valuable asset with a default fail state of "allow", you just have a fundamental design flaw in your system. The safety problem is you or your AI agent, not the segfault. | |
| ▲ | simonask 2 hours ago | parent | prev [-] | | Deterministic, well-defined behavior is inherently safer than undefined behavior. It allows you to diagnose the problem and fix it. UB emphatically does not, and I don't dare to think of how many millions of person-hours are wasted every year dealing with the results. |
|
|
|
| ▲ | eru 3 hours ago | parent | prev | next [-] |
| That's not true at all. First, you can define what happens when stack space is exceeded. Second not all programs need an arbitrary amount of stack space, some only need a constant amount that can be calculated ahead of time. (And some languages don't use a stack at all in their implementations.) Your language could also offer tools to probe how much stack space you have left, and make guarantees based on that. Or they could let you install some handlers for what to do when you run out of stack space. |
|
| ▲ | pjc50 3 hours ago | parent | prev | next [-] |
| UB based on input can be an exploit vector. |
| |
| ▲ | layer8 3 hours ago | parent [-] | | Unvalidated input can always be an exploit vector. | | |
| ▲ | Ygg2 3 hours ago | parent [-] | | Except in C, validation of user input can in itself be an exploit vector. | | |
| ▲ | layer8 3 hours ago | parent | next [-] | | That’s true in other languages as well. Any programmatic task can end up being an exploit vector. | | |
| ▲ | pjc50 2 hours ago | parent [-] | | No? That's the whole point of formal verification? You can even kind of retrofit this to C. The classic example is "sel4". You just need a set of proofs that the code doesn't trigger UB. This ends up being much larger and more complicated than the C itself. |
| |
| ▲ | greybeard69 3 hours ago | parent | prev [-] | | Turtles all the way down. |
|
|
|
|
| ▲ | stevenhuang 3 hours ago | parent | prev | next [-] |
| The examples are unequivocally UB. Full stop. How to think of this properly is that when you have UB, you are no longer under the auspices of a language standard. Things may work fine for a time, indefinitely even. But what happens instead is you unknowingly become subject to whimsies of your toolchain (swap/upgrade compilers), architecture, or runtime (libc version differences). You end up building a foundation on quicksand. That's the danger of UB. |
| |
| ▲ | flohofwoe 3 hours ago | parent [-] | | > The examples are unequivocally UB. Full stop. Tbh, already the first example (unaligned pointer access) is bogus and the C standard should be fixed (in the end the list of UB in the C standard is entirely "made up" and should be adapted to modern hardware, a lot of UB was important 30 years ago to allow optimizations on ancient CPUs, but a lot of those hardware restrictions are long gone). In the end it's the CPU and not the compiler which decides whether an unaligned access is a problem or not. On most modern CPUs unaligned load/stores are no problem at all (not even a performance penalty unless you straddle a cache line). There's no point in restricting the entire C standard because of the behaviour of a few esoteric CPUs that are stuck in the past. PS: we also need to stop with the "what if there is a CPU that..." discussions. The C standard should follow the current hardware, and not care about 40 year old CPUs or theoretical future CPU architectures. If esoteric CPUs need to be supported, compilers can do that with non-standard extensions. | | |
| ▲ | account42 2 hours ago | parent | next [-] | | Not having unaligned access in the language allows the compiler to assume that, for basic types where the aligment is at least the size, if two addresses are different then they don't alias and writes to one can't change the result of reads from the other. That's a very useful assumption to be able to make for optimization - much more useful than yolocasting pointers in a way that could get you unaligned ones. | |
| ▲ | leni536 2 hours ago | parent | prev | next [-] | | Undefined means that the ISO C doesn't define the behavior. An implementation is free to do so. | | |
| ▲ | simonask 2 hours ago | parent [-] | | If they do, that is no longer an implementation of C. It is a dialect of C, and there are many (GNU C being the most popular), but there are real drawbacks to using dialects. This is in contrast to the other category that exists, which is "implementation-defined". | | |
| ▲ | 1718627440 an hour ago | parent [-] | | > If they do, that is no longer an implementation of C. This is plain wrong. Undefined behaviour, means the C standard specifies no restriction on the behaviour of the program, which is what the implementation chooses to emit. An implementation can very well choose to emit any program it pleases, including programs that encrypt your harddisk, but also programs that stick to well defined rules. | | |
| ▲ | simonask 36 minutes ago | parent [-] | | Sure, but the point is that code written against such a compiler is not C and is not portable. It is written in a dialect of C, and that comes with drawbacks. Writing C (or any language) means adhering to the standard, because that's the definition of the language. |
|
|
| |
| ▲ | stevenhuang 3 hours ago | parent | prev | next [-] | | I agree. I meant to elaborate more on how to think of UB. For most C software on x86_64, UB is "fine" with very strong bunny ears. But it is preferable for one to, shall we say, write UB intentionally rather than accidentally and unknowingly. Having an awareness of all the minefields lends for more respect for the dangers of C code, it makes one question literally everything, and that would hopefully result in more correct code, more often. On that note, on some RISC-V cores unaligned access can turn a single load into hundreds of instructions. I think the problem is just that C is under specified for what we expect a language to provide in the modern age. It is still a great language, but the edges are sharp. | |
| ▲ | IshKebab 3 hours ago | parent | prev [-] | | There are still modern CPUs that don't support misaligned access. It would be insane for C to mandate that misaligned accesses are supported. However I do agree that just saying "the behaviour is undefined" is an unhelpful cop-out. They could easily say something like "non-atomic misaligned accesses either succeed or trap" or something like that. > In the end it's the CPU and not the compiler which decides whether an unaligned access is a problem or not. Not just the CPU - memory decides as well. MMIO devices often don't support misaligned accesses. | | |
| ▲ | 1718627440 an hour ago | parent | next [-] | | > They could easily say something like "non-atomic misaligned accesses either succeed or trap" or something like that. That means that the compiler must emit the read, even if the value is already known or never used, as it might trap. There is a reason for the UB! | |
| ▲ | thayne 3 hours ago | parent | prev | next [-] | | On hardware that doesn't support it, misaligned loads could be compiled to multiple loads and shifts. Probably not great for performance, and it doesn't work if you need it to be atomic, but it isn't impossible. | | | |
| ▲ | account42 2 hours ago | parent | prev [-] | | For x86 SSE there are aligned instructions that will trap on unaligned access. |
|
|
|
|
| ▲ | account42 2 hours ago | parent | prev [-] |
| Yes, this article is pretty much the definition of FUD. |