Remix.run Logo
munificent 5 days ago

I agree with the author's claim that you need thread safety for memory safety.

But I don't agree with:

> I will argue that this distinction isn’t all that useful, and that the actual property we want our programs to have is absence of Undefined Behavior.

There is plenty of undefined behavior that can't lead to violating memory safety. For example, in many languages, argument evaluation order is undefined. If you have some code like:

    foo(print(1), print(2));
In some languages, it's undefined as to whether "1" is printed before "2" or vice versa. But there's no way to violate memory safety with this.

I think the only term the author needs here is "memory safety", and they correctly observe that if the language has threading, then you need a memory model that ensures that threads can't break your memory safety.

Go lacks that. It seems to be a rare problem in practice, but if you want guarantees, Go doesn't give you them. In return, I guess it gives you slightly faster execution speed for writes that it allows to potentially be torn.

gliptic 5 days ago | parent | next [-]

The evaluation order is _unspecified_, not undefined behaviour.

gpderetta 5 days ago | parent [-]

Interestingly, at least in C++, this was changed in the recent past. It used to be that evaluation of arguments was not sequenced at all and if any evaluation touched the same variable, and at least one was a write, it was UB.

It was changed as part of the C++11 memory model and now, as you said, there is a sequenced-before order, it is just unspecified which one it is.

I don't know much about C, but I believe it was similarly changed in C11.

tialaramex 5 days ago | parent | next [-]

Sure, prior to the C++ 11 memory model there just isn't a memory ordering model in C++ and all programs in either C or C++ which would need ordering for correctness did not have any defined behaviour in the language standard.

This is very amusing because that means in terms of the language standard Windows and Linux, which both significantly pre-date C++ 11 and thus its memory model, were technically relying on Undefined Behaviour. Of course, as operating systems they're already off piste because they're full of raw assembly and so on.

Linux has its own ordering model as a result, pre-dating the C++ 11 model. Linus is writing software for multi-processor computers more than a decade before the C++ 11 model so obviously he can't wait around for that.

[Edit: Corrected Linux -> Linux when talking about the man]

gpderetta 5 days ago | parent [-]

It is not so much that windows and linux were relying on UB, but that these platforms, with their compilers, provided guarantees beyond the standard. e.g. GCC not only aims for C/C++ standard compliance, but also POSIX.

Of course these guarantees were often not fully written down nor necessarily self consistent (but then again, neither is the current standard).

gliptic 5 days ago | parent | prev [-]

Yes, but that's just a subset of expressions where unspecified sequencing applied. For instance, the example with two `print()` as parameters would have a sequence point (in pre-C++11 terminology) separating any reads/writes inside the `print` due to the function calls. It would never be UB even though the order in which the prints are called is still unspecified.

gpderetta 5 days ago | parent [-]

IIRC the point was that there was no sequence point between argument evaluation, so for example f(++i, ++i) was UB. Or maybe it was only for builtin operators?

Cppreference is not authoritative[1], but seems to support my recollection. In fact it states that the f(++i, ++i) was UB till C++17.

[1] https://en.cppreference.com/w/cpp/language/eval_order.html, Pre C++11 Ordering Rules, point (2).

gliptic 5 days ago | parent [-]

`f(++i, ++i)` is/was indeed UB, but the example in munificent's comment was `foo(print(1), print(2))` which as far as I know is not even if both `print` calls read/write the same memory.

gpderetta 5 days ago | parent [-]

(5) in the paragraph I mentioned earlier seems to prevent interleaving of function calls, which admittedly would make the language hard to use. So I think you are right.

ralfj 4 days ago | parent | prev | next [-]

> There is plenty of undefined behavior that can't lead to violating memory safety. For example, in many languages, argument evaluation order is undefined. If you have some code like:

You are mixing up non-determinism and UB. Sadly that's a common misunderstanding.

See https://www.ralfj.de/blog/2021/11/18/ub-good-idea.html for an explanation of what UB is, though I don't go into the distinction to non-determinism there.

zozbot234 5 days ago | parent | prev | next [-]

That's "unspecified" not "undefined". "Undefined behavior" literally means "anything goes", so any program that invokes it is broken by definition.

bigstrat2003 5 days ago | parent [-]

That is not true, that is a very specific definition of UB which C developers (among others) favor. That doesn't mean that another language can't say "this is undefined behavior" without all the baggage that accompanies the term in C.

zozbot234 5 days ago | parent | next [-]

It's literally how the term "UB" is defined, and understood by experts. Why would anyone want to say "undefined" when they really mean "unspecified"? That's just confusing.

bigstrat2003 5 days ago | parent [-]

No, it's how one very specific community of experts understands it. It is not some kind of universal law of definition that it must mean that always and everywhere. As far as what is confusing, that is a matter of perspective. I think it is confusing (to put it mildly) that the C community has chosen to use "undefined behavior" to mean "it must never happen, and anything goes if it does". That is extremely counterintuitive, and only makes sense to those who live and breathe that world. So if the standard is to be "avoiding confusion", then we better change the definition used by the C community ASAP.

ameliaquining 5 days ago | parent | next [-]

I agree that the term "undefined behavior", when used as in C/C++/Rust/Swift/.NET, isn't very good at communicating to non-experts what's at stake, not least because it doesn't sound scary enough (the security community remains indebted to whoever coined the term "nasal demons"). That said, is there a specific other community of practice where there's a shared understanding that the term "undefined behavior" means something different?

uecker 5 days ago | parent | prev [-]

It is also not what the C community has chosen. It is what was imposed on us by certain optimizing compilers that used the interpretation that gave them maximum freedom to excel in benchmarks, and it was then endorsed by C++. The C definition is that "undefined behavior" can have arbitrary concrete behavior, not that a compiler can assume it does not happen. (that form semantic people prefer the former because it makes their life easier did not help)

ralfj 3 days ago | parent [-]

> The C definition is that "undefined behavior" can have arbitrary concrete behavior, not that a compiler can assume it does not happen.

What is the difference between those? How does a compiler that assumes UB never happens violate the requirement that UB can have arbitrary concrete behavior? If we look at a simple example like optimizing "x + y > x" (signed arithmetic, y known to be positive) to "true" -- that will lead to some arbitrary concrete behavior of the program, so it seems covered by the definition.

I assume that what the original C authors meant was closer to "on signed integer overflow, non-deterministically pick some result from the following set", but that's not what they wrote in the standard... if you want to specify that something is non-deterministic, you need to spell out exactly what the set of possible choices are. Maybe for singed integer overflow one could infer this (though it really should be made explicit IMO), but C also says that the program has UB "by default" if it runs into a case not described by the standard, and there's just no way to infer a set of choices from that as far as I can see.

uecker 3 days ago | parent [-]

"arbitrary concrete behavior" means that at this point anything can happen on the real machine. This implies that everything before this point has to behave according to the specification. "is impossible" is stronger, as the whole program could behave erratically. But having partial correctness is important in a lot of scenarios and this is why we want to have it and in "UB" it is the former and not "impossible".

In the ISO C standard, we use "unspecified" for a non-deterministic choice among clearly specified alternatives. So this is well understood.

ralfj 3 days ago | parent [-]

> "arbitrary concrete behavior" means that at this point anything can happen on the real machine. This implies that everything before this point has to behave according to the specification. "is impossible" is stronger, as the whole program could behave erratically. But having partial correctness is important in a lot of scenarios and this is why we want to have it and in "UB" it is the former and not "impossible".

So that rules out "time-traveling UB", but it would still permit optimizing "x+y < x" to "false" for non-negative y, right? I can't tell if you think that that is a legal transformation or not, and I'd be curious to know. :)

FWIW I agree we shouldn't let UB time-travel. We should say that all observable events until the point of UB must be preserved. AFAIK that is e.g. what CompCert does. But I would still describe that as "the compiler may assume that UB does not happen" (and CompCert makes use of that assumption for its optimizations), so I don't understand the distinction you are making.

> In the ISO C standard, we use "unspecified" for a non-deterministic choice among clearly specified alternatives. So this is well understood.

Except for "unspecified value" which apparently can be very different from just non-deterministically choosing any value (https://www.open-std.org/jtc1/sc22/wg14/www/docs/dr_451.htm). If "unspecified" always meant "normal non-deterministic choice", then this would clearly be a miscompilation: https://godbolt.org/z/9Gqqs7axj.

Quite a few places in the standard just say "result/behavior is unspecified", so the set of alternatives is often not very clear IMO. In particular, when it says that under some condition "the result is unspecified", and let's say the result has integer type, does that mean it non-deterministically picks some "normal" integer value, or can it be an "unspecified value" that behaves more like LLVM undef in that it is distinct from every "normal" value and can violate basic properties like "x == x"?

bakugo 5 days ago | parent | prev | next [-]

"Undefined behavior" is not a meaningless made up term that you can redefine at will.

The word "undefined" has a clear meaning: there is no behavior defined at all for what a given piece of code will do, meaning it can literally do anything. If the language spec defines the possible behaviors you can expect (even if the behavior can vary between implementations), then by definition it's not undefined.

bigstrat2003 5 days ago | parent [-]

> "Undefined behavior" is not a meaningless made up term that you can redefine at will.

Sure, I agree with that.

> The word "undefined" has a clear meaning: there is no behavior defined at all for what a given piece of code will do...

That is true, but...

> ...meaning it can literally do anything.

This is not at all true! That is a different (but closely related) matter, which is "what is to be done about undefined behavior". Which is certainly something one has to take a stance on when working to a language spec that has undefined behavior, but that does not mean that "undefined" automatically means your preferred interpretation of how to handle undefined behavior.

zozbot234 5 days ago | parent [-]

The original question is how UB is defined, not about the preferred way of dealing with it in a practical sense. And the definition of UB is behavior for which the language definition imposes no requirements, and explicitly leaves open the possibility of ignoring the situation altogether with unpredictable results.

gliptic 5 days ago | parent | prev [-]

The author is using the term in the way that everyone else understands it. They are not aware of your unusual definition.

joaohaas 5 days ago | parent | prev [-]

Your example does not classify as 'undefined behavior'. Something is 'undefined behavior' if it is specified in the language spec, and in such case yes, the language is capable of doing anything including violating memory safety.