| ▲ | muvlon 2 hours ago | ||||||||||||||||||||||||||||||||||||||||||||||
Yes there is tons of surprising and weird UB in C, but this article doesn't do a great job of showcasing it. It barely scratches the surface. Here's a way weirder example:
This is totally fine if x is just an int, but the volatile makes it UB. Why? 5.1.2.4.1 says any volatile access - including just reading it - is a side effect. 6.5.1.2 says that unsequenced side effects on the same scalar object (in this case, x) are UB. 6.5.3.3.8 tells us that the evaluations of function arguments are indeterminately sequenced w.r.t. each other.So in common parlance, a "data race" is any concurrent accesses to the same object from different threads, at least one of which is a write. In C, we can have a data race on a single thread and without any writes! | |||||||||||||||||||||||||||||||||||||||||||||||
| ▲ | thomashabets2 2 minutes ago | parent | next [-] | ||||||||||||||||||||||||||||||||||||||||||||||
Author here. > It barely scratches the surface. I agree. The point of the post is not to enumerate and explain the implications of all 283 uses of the word "undefined" in the standard. Nor enumerate all the things that are undefined by omission. The point of the post is to say it's not possible to avoid them. Or at least, no human since the invention of C in 1972 has. And if it's not succeeded for 54 years, "try harder", or "just never make a mistake", is at least not the solution. The (one!) exploitable flaw found by Mythos in OpenBSD was an impressive endorsement of the OpenBSD developers, and yet as the post says, I pointed it at the simplest of their code and found a heap of UB. Now, is it exploitable that `find` also reads the uninitialized auto variable `status` (UB) from a `waitpid(&status)` before checking if `waitpid()` returned error? (not reported) I can't imagine an architecture or compiler where it would be, no. FTA: > The following is not an attempt at enumerating all the UB in the world. It’s merely making the case that UB is everywhere, and if nobody can do it right, how is it even fair to blame the programmer? My point is that ALL nontrivial C and C++ code has UB. | |||||||||||||||||||||||||||||||||||||||||||||||
| ▲ | tialaramex 5 minutes ago | parent | prev | next [-] | ||||||||||||||||||||||||||||||||||||||||||||||
Volatile is a type system hack. They should have done a more principled fix, and certainly modern languages should not act as though "C did it" makes it a good idea. The reason for the hack is that very early C compilers just always spill, so you can write MMIO driver code by setting a pointer to point at the MMIO hardware and it actually works because every time you change x the CPU instruction performs a memory write. Once C compilers got some basic optimisations that obvious "clever" trick stops working because the compiler can see that we're just modifying x over, and over and over, and so it doesn't spill x from a register and the driver doesn't work properly. C's "volatile" keyword is a hack saying "OK compiler, forget that optimisation" which was presumably a few minutes work to implement, whereas the correct fix, providing MMIO intrinsics in the associated library, was a lot of work. Why should you want intrinsics here? Intrinsics let you actually spell out what's possible and what isn't. On some targets we can actually do a 1-byte 2-byte and 4-byte write, those are distinct operations and the hardware knows, so e.g. maybe some device expects a 4-byte RGBA write and so if you emit four 1-byte writes that's very confusing and maybe it doesn't work, don't do that. On some targets bit-level writes are available, you can say OK, MMIO write to bit 4 of address 0x1234 and it will write a single bit. If you only have volatile there's no way to know what happens or what it means. | |||||||||||||||||||||||||||||||||||||||||||||||
| ▲ | mananaysiempre an hour ago | parent | prev | next [-] | ||||||||||||||||||||||||||||||||||||||||||||||
And it makes sense as long as you allow the concept of unsequenced operations at all (admittedly it’s somewhat rare; e.g. in Scheme such things are defined to still occur in sequence, but which specific sequence is unspecified and potentially different each time). The “volatile” annotation marks your variable as being an MMIO register or something of that nature, something that could change at any point for reasons outside of the compiler’s control. Naturally, this means all of the hazards of concurrent modification are potentially there. That said, your “common parlance” definition of “data race” is not the definition used by the C standard, so your last sentence is at best misleading in a discussion of standard C. > The execution of a program contains a data race if it contains two conflicting actions in different threads, at least one of which is not atomic, and neither happens before the other. Any such data race results in undefined behavior. (Here “conflicting” and “happens before” are defined in the preceding text.) | |||||||||||||||||||||||||||||||||||||||||||||||
| |||||||||||||||||||||||||||||||||||||||||||||||
| ▲ | berti 29 minutes ago | parent | prev | next [-] | ||||||||||||||||||||||||||||||||||||||||||||||
Reading a register from a microcontroller peripheral may well reset it as an example of a possible side-effect here, and that's exactly the kind of thing you use volatile for. | |||||||||||||||||||||||||||||||||||||||||||||||
| ▲ | simonask 2 hours ago | parent | prev | next [-] | ||||||||||||||||||||||||||||||||||||||||||||||
I think the article's point is that you don't actually have to get weird at all to run into UB. Lots of people mistakenly think that C and C++ are "really flexible" because they let you do "what you want". The truth of the matter is that almost every fancy, powerful thing you think you can do is an absolute minefield of UB. | |||||||||||||||||||||||||||||||||||||||||||||||
| |||||||||||||||||||||||||||||||||||||||||||||||
| ▲ | RobotToaster 5 minutes ago | parent | prev | next [-] | ||||||||||||||||||||||||||||||||||||||||||||||
With volatile it could be changed by an interrupt service routine between reads, so it makes sense. | |||||||||||||||||||||||||||||||||||||||||||||||
| ▲ | sethev an hour ago | parent | prev [-] | ||||||||||||||||||||||||||||||||||||||||||||||
Yes, there is a data race there. The value of a volatile can be changed by something outside the current thread. That’s what volatile means and why it exists. Edit: thread=thread of execution. I’m not making a point about thread safety within a program. | |||||||||||||||||||||||||||||||||||||||||||||||
| |||||||||||||||||||||||||||||||||||||||||||||||