Remix.run Logo
uecker 3 days ago

"arbitrary concrete behavior" means that at this point anything can happen on the real machine. This implies that everything before this point has to behave according to the specification. "is impossible" is stronger, as the whole program could behave erratically. But having partial correctness is important in a lot of scenarios and this is why we want to have it and in "UB" it is the former and not "impossible".

In the ISO C standard, we use "unspecified" for a non-deterministic choice among clearly specified alternatives. So this is well understood.

ralfj 3 days ago | parent [-]

> "arbitrary concrete behavior" means that at this point anything can happen on the real machine. This implies that everything before this point has to behave according to the specification. "is impossible" is stronger, as the whole program could behave erratically. But having partial correctness is important in a lot of scenarios and this is why we want to have it and in "UB" it is the former and not "impossible".

So that rules out "time-traveling UB", but it would still permit optimizing "x+y < x" to "false" for non-negative y, right? I can't tell if you think that that is a legal transformation or not, and I'd be curious to know. :)

FWIW I agree we shouldn't let UB time-travel. We should say that all observable events until the point of UB must be preserved. AFAIK that is e.g. what CompCert does. But I would still describe that as "the compiler may assume that UB does not happen" (and CompCert makes use of that assumption for its optimizations), so I don't understand the distinction you are making.

> In the ISO C standard, we use "unspecified" for a non-deterministic choice among clearly specified alternatives. So this is well understood.

Except for "unspecified value" which apparently can be very different from just non-deterministically choosing any value (https://www.open-std.org/jtc1/sc22/wg14/www/docs/dr_451.htm). If "unspecified" always meant "normal non-deterministic choice", then this would clearly be a miscompilation: https://godbolt.org/z/9Gqqs7axj.

Quite a few places in the standard just say "result/behavior is unspecified", so the set of alternatives is often not very clear IMO. In particular, when it says that under some condition "the result is unspecified", and let's say the result has integer type, does that mean it non-deterministically picks some "normal" integer value, or can it be an "unspecified value" that behaves more like LLVM undef in that it is distinct from every "normal" value and can violate basic properties like "x == x"?