Remix.run Logo
zozbot234 5 days ago

That's "unspecified" not "undefined". "Undefined behavior" literally means "anything goes", so any program that invokes it is broken by definition.

bigstrat2003 5 days ago | parent [-]

That is not true, that is a very specific definition of UB which C developers (among others) favor. That doesn't mean that another language can't say "this is undefined behavior" without all the baggage that accompanies the term in C.

zozbot234 5 days ago | parent | next [-]

It's literally how the term "UB" is defined, and understood by experts. Why would anyone want to say "undefined" when they really mean "unspecified"? That's just confusing.

bigstrat2003 5 days ago | parent [-]

No, it's how one very specific community of experts understands it. It is not some kind of universal law of definition that it must mean that always and everywhere. As far as what is confusing, that is a matter of perspective. I think it is confusing (to put it mildly) that the C community has chosen to use "undefined behavior" to mean "it must never happen, and anything goes if it does". That is extremely counterintuitive, and only makes sense to those who live and breathe that world. So if the standard is to be "avoiding confusion", then we better change the definition used by the C community ASAP.

ameliaquining 5 days ago | parent | next [-]

I agree that the term "undefined behavior", when used as in C/C++/Rust/Swift/.NET, isn't very good at communicating to non-experts what's at stake, not least because it doesn't sound scary enough (the security community remains indebted to whoever coined the term "nasal demons"). That said, is there a specific other community of practice where there's a shared understanding that the term "undefined behavior" means something different?

uecker 5 days ago | parent | prev [-]

It is also not what the C community has chosen. It is what was imposed on us by certain optimizing compilers that used the interpretation that gave them maximum freedom to excel in benchmarks, and it was then endorsed by C++. The C definition is that "undefined behavior" can have arbitrary concrete behavior, not that a compiler can assume it does not happen. (that form semantic people prefer the former because it makes their life easier did not help)

ralfj 3 days ago | parent [-]

> The C definition is that "undefined behavior" can have arbitrary concrete behavior, not that a compiler can assume it does not happen.

What is the difference between those? How does a compiler that assumes UB never happens violate the requirement that UB can have arbitrary concrete behavior? If we look at a simple example like optimizing "x + y > x" (signed arithmetic, y known to be positive) to "true" -- that will lead to some arbitrary concrete behavior of the program, so it seems covered by the definition.

I assume that what the original C authors meant was closer to "on signed integer overflow, non-deterministically pick some result from the following set", but that's not what they wrote in the standard... if you want to specify that something is non-deterministic, you need to spell out exactly what the set of possible choices are. Maybe for singed integer overflow one could infer this (though it really should be made explicit IMO), but C also says that the program has UB "by default" if it runs into a case not described by the standard, and there's just no way to infer a set of choices from that as far as I can see.

uecker 3 days ago | parent [-]

"arbitrary concrete behavior" means that at this point anything can happen on the real machine. This implies that everything before this point has to behave according to the specification. "is impossible" is stronger, as the whole program could behave erratically. But having partial correctness is important in a lot of scenarios and this is why we want to have it and in "UB" it is the former and not "impossible".

In the ISO C standard, we use "unspecified" for a non-deterministic choice among clearly specified alternatives. So this is well understood.

ralfj 3 days ago | parent [-]

> "arbitrary concrete behavior" means that at this point anything can happen on the real machine. This implies that everything before this point has to behave according to the specification. "is impossible" is stronger, as the whole program could behave erratically. But having partial correctness is important in a lot of scenarios and this is why we want to have it and in "UB" it is the former and not "impossible".

So that rules out "time-traveling UB", but it would still permit optimizing "x+y < x" to "false" for non-negative y, right? I can't tell if you think that that is a legal transformation or not, and I'd be curious to know. :)

FWIW I agree we shouldn't let UB time-travel. We should say that all observable events until the point of UB must be preserved. AFAIK that is e.g. what CompCert does. But I would still describe that as "the compiler may assume that UB does not happen" (and CompCert makes use of that assumption for its optimizations), so I don't understand the distinction you are making.

> In the ISO C standard, we use "unspecified" for a non-deterministic choice among clearly specified alternatives. So this is well understood.

Except for "unspecified value" which apparently can be very different from just non-deterministically choosing any value (https://www.open-std.org/jtc1/sc22/wg14/www/docs/dr_451.htm). If "unspecified" always meant "normal non-deterministic choice", then this would clearly be a miscompilation: https://godbolt.org/z/9Gqqs7axj.

Quite a few places in the standard just say "result/behavior is unspecified", so the set of alternatives is often not very clear IMO. In particular, when it says that under some condition "the result is unspecified", and let's say the result has integer type, does that mean it non-deterministically picks some "normal" integer value, or can it be an "unspecified value" that behaves more like LLVM undef in that it is distinct from every "normal" value and can violate basic properties like "x == x"?

bakugo 5 days ago | parent | prev | next [-]

"Undefined behavior" is not a meaningless made up term that you can redefine at will.

The word "undefined" has a clear meaning: there is no behavior defined at all for what a given piece of code will do, meaning it can literally do anything. If the language spec defines the possible behaviors you can expect (even if the behavior can vary between implementations), then by definition it's not undefined.

bigstrat2003 5 days ago | parent [-]

> "Undefined behavior" is not a meaningless made up term that you can redefine at will.

Sure, I agree with that.

> The word "undefined" has a clear meaning: there is no behavior defined at all for what a given piece of code will do...

That is true, but...

> ...meaning it can literally do anything.

This is not at all true! That is a different (but closely related) matter, which is "what is to be done about undefined behavior". Which is certainly something one has to take a stance on when working to a language spec that has undefined behavior, but that does not mean that "undefined" automatically means your preferred interpretation of how to handle undefined behavior.

zozbot234 5 days ago | parent [-]

The original question is how UB is defined, not about the preferred way of dealing with it in a practical sense. And the definition of UB is behavior for which the language definition imposes no requirements, and explicitly leaves open the possibility of ignoring the situation altogether with unpredictable results.

gliptic 5 days ago | parent | prev [-]

The author is using the term in the way that everyone else understands it. They are not aware of your unusual definition.