Remix.run Logo
uecker 5 days ago

It is also not what the C community has chosen. It is what was imposed on us by certain optimizing compilers that used the interpretation that gave them maximum freedom to excel in benchmarks, and it was then endorsed by C++. The C definition is that "undefined behavior" can have arbitrary concrete behavior, not that a compiler can assume it does not happen. (that form semantic people prefer the former because it makes their life easier did not help)

ralfj 3 days ago | parent [-]

> The C definition is that "undefined behavior" can have arbitrary concrete behavior, not that a compiler can assume it does not happen.

What is the difference between those? How does a compiler that assumes UB never happens violate the requirement that UB can have arbitrary concrete behavior? If we look at a simple example like optimizing "x + y > x" (signed arithmetic, y known to be positive) to "true" -- that will lead to some arbitrary concrete behavior of the program, so it seems covered by the definition.

I assume that what the original C authors meant was closer to "on signed integer overflow, non-deterministically pick some result from the following set", but that's not what they wrote in the standard... if you want to specify that something is non-deterministic, you need to spell out exactly what the set of possible choices are. Maybe for singed integer overflow one could infer this (though it really should be made explicit IMO), but C also says that the program has UB "by default" if it runs into a case not described by the standard, and there's just no way to infer a set of choices from that as far as I can see.

uecker 3 days ago | parent [-]

"arbitrary concrete behavior" means that at this point anything can happen on the real machine. This implies that everything before this point has to behave according to the specification. "is impossible" is stronger, as the whole program could behave erratically. But having partial correctness is important in a lot of scenarios and this is why we want to have it and in "UB" it is the former and not "impossible".

In the ISO C standard, we use "unspecified" for a non-deterministic choice among clearly specified alternatives. So this is well understood.

ralfj 3 days ago | parent [-]

> "arbitrary concrete behavior" means that at this point anything can happen on the real machine. This implies that everything before this point has to behave according to the specification. "is impossible" is stronger, as the whole program could behave erratically. But having partial correctness is important in a lot of scenarios and this is why we want to have it and in "UB" it is the former and not "impossible".

So that rules out "time-traveling UB", but it would still permit optimizing "x+y < x" to "false" for non-negative y, right? I can't tell if you think that that is a legal transformation or not, and I'd be curious to know. :)

FWIW I agree we shouldn't let UB time-travel. We should say that all observable events until the point of UB must be preserved. AFAIK that is e.g. what CompCert does. But I would still describe that as "the compiler may assume that UB does not happen" (and CompCert makes use of that assumption for its optimizations), so I don't understand the distinction you are making.

> In the ISO C standard, we use "unspecified" for a non-deterministic choice among clearly specified alternatives. So this is well understood.

Except for "unspecified value" which apparently can be very different from just non-deterministically choosing any value (https://www.open-std.org/jtc1/sc22/wg14/www/docs/dr_451.htm). If "unspecified" always meant "normal non-deterministic choice", then this would clearly be a miscompilation: https://godbolt.org/z/9Gqqs7axj.

Quite a few places in the standard just say "result/behavior is unspecified", so the set of alternatives is often not very clear IMO. In particular, when it says that under some condition "the result is unspecified", and let's say the result has integer type, does that mean it non-deterministically picks some "normal" integer value, or can it be an "unspecified value" that behaves more like LLVM undef in that it is distinct from every "normal" value and can violate basic properties like "x == x"?