Remix.run Logo
pizlonator 4 days ago

I don’t buy the “it’s because of optimization argument”.

And I especially don’t buy that UB is there for register allocation.

First of all, that argument only explains UB of OOB memory accesses at best.

Second, you could define the meaning of OOB by just saying “pointers are integers” and then further state that nonescaping locals don’t get addresses. Many ways you could specify that, if you cared badly enough. My favorite way to do it involves saying that pointers to locals are lazy thunks that create addresses on demand.

OskarS 4 days ago | parent | next [-]

No, it's absolutely because of optimization. For instance, C++20 defined signed integer representation as having two's complement, but signed integer overflow is still undefined behaviour. The reason is that if you compile with flags that make it defined, you lose a few percentage points of performance (primarily from preventing loop unrolling and auto-vectorization).

Same thing with e.g. strict aliasing or the various UB that exists in the standard library. For instance, it's UB to pass a null pointer to strlen. Of course, you can make that perfectly defined by adding an `if` to strlen that just returns 0. But then you're adding a branch to every strlen, and C is simply not willing to do that for performance reasons, so they say "this is UB" instead.

Pretty much instance of UB in standard C or C++ is because making it defined would either hamper the optimizer, or it would make standard library functions slower. They don't just make things UB for fun.

pizlonator 4 days ago | parent | next [-]

This isn’t the reason why the UB is in the spec in the first place. The spec left stuff undefined to begin with because of lack of consensus over what it should do.

For example the reason why 2s complement took so long is because of some machine that ran C that still existed that was 1s complement.

> The reason is that if you compile with flags that make it defined, you lose a few percentage points of performance (primarily from preventing loop unrolling and auto-vectorization).

I certainly don’t lose any perf on any workload of mine if I set -fwrapv

If your claim is that implementers use optimization as the excuse for wanting UB, then I can agree with that.

I don’t agree that it’s a valid argument though. The performance wins from UB are unconvincing, except maybe on BS benchmarks that C compilers overtune for marketing reasons.

OskarS 4 days ago | parent [-]

> For example the reason why 2s complement took so long is because of some machine that ran C that still existed that was 1s complement.

You're misunderstanding me: as of C++20, there is no other representation in C++ for signed integers other than two's complement (no signed ones' complement, no signed magnitude, nothing else), but signed overflow is still UB. It's not because of obscure machines or hardware, such hardware is not relevant for C++20 and later. The reason for it is performance. From the accepted paper [1]:

> The following polls were taken, and corresponding modifications made to the paper. The main change between [P0907r0] and the subsequent revision is to maintain undefined behavior when signed integer overflow occurs, instead of defining wrapping behavior. This direction was motivated by:

> * Performance concerns, whereby defining the behavior prevents optimizers from assuming that overflow never occurs

You may disagree, you may think they're wrong, but their motivation is performance, that's why this is UB. It's right there in black and white. This was C++, not C, but it's not at all unthinkable that the C standard will also mandate two's complement at some point, and if they do, they almost certainly keep signed overflow undefined for exactly the same reason.

It's not hard to write code that optimizes much better when you use signed loop variables. One of my favorite examples is this function [2] to turn a 3D mesh inside out by flipping the edges of each triangle in a triangle mesh. The godbolt link has two versions of the same function, one with a signed loop variable, one with an unsigned one. The signed one auto-vectorizes and optimizes much better because it can assume that the loop variable never overflows (this version is C++, it's trivial to rewrite it in C and get the same results).

This is why signed overflow is UB.

[1]: https://www.open-std.org/jtc1/sc22/wg21/docs/papers/2018/p09...

[2]: https://godbolt.org/z/a1P5Y17fn

pizlonator 4 days ago | parent [-]

I agree that the stated motivation for continuing to keep UB is performance.

I know that this is misguided based on my own perf tests and others’ perf tests.

Also, it’s wrong to say flat out that UB on signed ints is somehow necessary for perf when even a simple perf test shows that it just doesn’t matter, and the optimization it enables is quite obscure.

account42 4 days ago | parent | prev [-]

I wish there was a way to opt into undefined behavior for unsigned overflow. Its rare that wraparound is actually what you want and in many cases overflow is still a bug. Sucks to have to either miss out on potential optimizations or miss out on the guarantee that the value can't be negative.

uecker 4 days ago | parent | next [-]

I recently filed a bug for this: https://gcc.gnu.org/bugzilla/show_bug.cgi?id=116193

OskarS 4 days ago | parent | prev | next [-]

You do need some way to overflow properly, because sometimes that is what you want. A common example would be PRNGs, which frequently rely on overflow (the classic LCG, for instance). You could argue that should just be a library function or something (e.g. `add_with_overflow`), though that's more C++ than C.

You are absolutely, 100% correct though: I've never seen a case where accidental overflow doesn't start causing bugs anyway. Like, the Pac-Man kill screen is caused by a byte overflowing (it happens on level 256), and the game goes insane. Pac-Man was written in assembly where overflow is defined behavior, but that doesn't matter at all, the game is still broken. If signed overflow is essentially always a bug anyway, why not make it UB and optimize around it? Especially since it is super-valuable in being able to unroll loops.

People always bring up signed integer overflow as an argument for why UB is scary, and it always seemed like such a bad argument to me. Like, I can understand why people think UB has gone too far in C/C++, but signed overflow is such a bad example. It's one of the most sensible bits of UB in the entire standard, IMHO.

pizlonator 4 days ago | parent | prev [-]

-fwrapv

j16sdiz 4 days ago | parent | prev | next [-]

> First of all, that argument only explains UB of OOB memory accesses at best.

It explains many loop-unroll and integer overflow as well.

gpderetta 4 days ago | parent | prev | next [-]

> nonescaping locals don’t get addresses

inlining, interprocedural optimizations.

For example, something as an trivial accessor member function would be hard to optimize.

pjmlp 4 days ago | parent | next [-]

Safer languages manage similar optimizations without having to rely on UB.

gpderetta 4 days ago | parent [-]

Well, yes, safer languages prevent pointer forging statically, so provenance is trivially enforced.

And I believe that provenance is an issue in unsafe rust.

tialaramex 4 days ago | parent [-]

Unlike C++ and (until Martin's work is moved to the actual language ISO document rather than separate) C the Rust language actually has a definition for how provenance is supposed to work.

https://doc.rust-lang.org/std/ptr/index.html#provenance

The definition isn't deemed complete because of aliasing. AIUI The definition we have is adequate if you're OK with treating all edge cases for "Is this an alias?" as "Yes" but eventually Rust will also need to carefully nail down all those edge cases so that you can tread closer without falling off.

pizlonator 4 days ago | parent | prev [-]

Inlining doesn’t require UB

gpderetta 4 days ago | parent [-]

I didn't claim that. What I mean is that if a pointer escapes into an inlined function and no further, it will still prevent further optimizations if we apply your rule that only non-escaping locals don't get addresses. The main benefit of inlining is that it is effectively a simple way to do interprocedurally optimizations. I.e.

  inline void add(int* to, int what) { *to += what; }
  void foo();
  void bar() {
      int x = 0;
      add(&x, 1);
      foo();
      return x;
  }
By your rules, optimizing bar to return the constant 1 would not be allowed.
pizlonator 4 days ago | parent [-]

I think you’re applying a very strange strawman definition to “nonescaping”. It’s certainly not the definition I would pick.

The right definition is probably something like:

- pointers that come out of the outside world (syscalls) are escaped. They are just integers.

- pointers to locals have provenance. They point to an abstract location. It is up to the implementation to decide when the location gets an integer value (is in an actual address) and what that value is. The implementation must do this no later than when the pointer to the local escapes.

- pointer values passed to the outside world (syscalls) escape.

- pointer values stored in escaped memory also escape, transitively

That’s one possible definition that turns the UB into implementation defined behavior. I’m sure there are others

gpderetta 4 days ago | parent [-]

I think you have a non-standard definition. An escaping pointer is an address that the compiler cannot fully track (directly or indirectly). It could be to a syscall, it could be a separately compiled function (without LTO), it could even be to a function in the same translation unit if the compiler cannot inline that function nor do sufficient intraprocedural analysis.

Again, I'm not a compiler writer, but my understanding is that non escaping variables can be optimized in SSA form, escaped variables are otherwise treated as memory and the compiler must be significantly more conservative.

In any case, whether a pointer escapes or not depends purely on the compiler capabilities and optimization level, so it would not be sane making a code well defined or UB depending on the compiler or optimization level.

edit: to be more concrete, do you think that in my example the constant folding of the return into return 1 should be allowed? And if so, which variant of this code would prevent the optimization and why?

pizlonator 4 days ago | parent [-]

> Again, I'm not a compiler write

I am a compiler writer.

The definition I gave in my post is general enough to cover all possible compilers (ones that have LTO, ones that are inside a DBT, etc).

Yes the constant folding should be allowed because the pointer to the local never escaped.

tialaramex 4 days ago | parent | prev [-]

> Second, you could define the meaning of OOB by just saying “pointers are integers"

This means losing a lot of optimisations, so in fact when you say you "don't buy" this argument you only mean that you don't care about optimisation. Which is fine, but this does mean the "improved" C isn't very useful in a lot of applications, might as well choose Java.

pizlonator 4 days ago | parent [-]

> This means losing a lot of optimisations

You won’t lose “a lot” of optimizations and you certainly won’t lose enough for it to make a noticeable difference in any workload that isn’t SPEC