Remix.run Logo
bestouff 4 hours ago

The problem of UB is not really that it may crash in some architecture. The real problem is that the compiler expects UB code to NOT happen, so if you write UB code anyway the compiler (and especially the optimizer) is allowed to translate that to anything that's convenient for its happy path. And sometimes that "anything" can be really unexpected (like removing big chunks of code).

1718627440 an hour ago | parent | next [-]

Yes, that is a problem, but this is also the most useful feature and reason for UB. People that suggest to just define it or make it unspecified, miss, that the compiler being able to remove whole parts of a program is the point. When I write code, that is UB for certain inputs, it is because I do not intend the program to have any behaviour for these inputs. I do want the compiler to optimize those away or do anything that effects from the behaviour of the other defined cases. It is deeply satisfying to add some conditions triggering log strings and see that they do not occur in the binary, because they can be only reached via UB.

inkysigma 4 hours ago | parent | prev | next [-]

One example along this path as an example is that every function must either terminate or have a side effect. I don't think one has bitten me yet but I could completely see how you accidentally write some kind of infinite loop or recursion and the function gets deleted. Also, bonus points for tail recursion so this bug might only show up with a higher optimization level if during debug nothing hit the infinite loop.

account42 2 hours ago | parent | next [-]

Infinite loop without side effects == program stuck and not responding on user input and not outputting anything. That's not something a useful program will ever want to do.

Certhas an hour ago | parent | next [-]

Not true, C++ made it so trivial infinite loops are not UB because it turns out they do have legitimate uses.

https://lists.isocpp.org/std-proposals/2020/05/1322.php

https://www.open-std.org/jtc1/sc22/wg21/docs/papers/2024/p28...

account42 an hour ago | parent [-]

Yes, the C++ committee has been making some stupid decisions lately. This is not the only one.

Low level platform-specific code that needs to hot spin until an interrupt happens can use assembly for that part which it will need to do for the interrupt handler anyway.

zarzavat an hour ago | parent | prev | next [-]

https://9p.io/sources/plan9/sys/src/libc/9sys/abort.c

account42 an hour ago | parent [-]

This is already UB without an infinite loop.

xigoi an hour ago | parent | prev [-]

The problem is when you accidentally write an infinite loop. In a different language, you run the code, see that it gets stuck and fix it. In C, the compiler may delete the function, making it hard to realize what is happening.

account42 an hour ago | parent | next [-]

This is not a problem that C or C++ programmers actually encounter, ever.

1718627440 an hour ago | parent | prev [-]

Note, that this is not true for C.

1718627440 an hour ago | parent | prev [-]

That's only true in C++ though, not in C.

dzaima 37 minutes ago | parent [-]

C does allow unconditional infinite loops (e.g. "while (1) { }" isn't UB) but still is UB if the controlling expression isn't constant (e.g. "while (two < 10) { }" is UB if two is less than 10)

eru 3 hours ago | parent | prev | next [-]

Yes, a crash is about the most benign UB: at least it's highly visible.

In worse scenarios, your programme will silently continue with garbage, or format your hard disk or give attackers the key to the kingdom.

rando1234 2 hours ago | parent | prev | next [-]

The point in the article that 'It's not about optimisations' really got my attention. I've previously done some work where we wrote an analysis pass under the assumption that it executed last in the transformation pipeline and this was needed for correctness. The assumption was that since no further optimisations happened it was safe. Now I'm not so sure...

account42 2 hours ago | parent | prev | next [-]

That's a feature, not a problem.

anilakar 4 hours ago | parent | prev [-]

Removing code paths that the programmer has explicitly laid out in the source code should be made a hard compile error unless the operation has been tagged with an attribute (anyone who wants to add the unsafe keyword to C? ).

Another commenter suggested using LLMs, but I disagree. Having clangd emit warning squiggles for unchecked operations (like signed addition) would be a good start.

flohofwoe 3 hours ago | parent | next [-]

> Removing code paths that the programmer has explicitly laid out in the source code should be made a hard compile error unless the operation has been tagged with an attribute (anyone who wants to add the unsafe keyword to C? ).

Dead code elimination is essential for performance, especially when using templates (this is basically what enables the fabled "zero cost abstraction" because complex template code may generate a lot of 'inactive' code which needs to be removed by the optimizer).

The actual issue is that the compiler is free to eliminate code paths after UB, but that's also not trivial to fix (and some optimizations are actually enabled by manually injecting UB (like `__builtin_unreachable()` which can make a measurable difference in the right places).

1718627440 an hour ago | parent [-]

> The actual issue is that the compiler is free to eliminate code paths after UB

Not, that the compiler can also emit code paths before UB, as UB is a property of the whole program, not just of a single statement.

amoss 4 hours ago | parent | prev | next [-]

Dead code elimination is run multiple times, including after other optimizations. So code that is not initially dead may become dead after propagating other information. Converting dead code into an error condition would make most generic code that is specialized for a particular context illegal.

gpderetta 2 hours ago | parent | prev | next [-]

Consider:

   enum op_t{ add, mul };
   int exec(op_t op, int a, int b) {
       if(op == add) { return a+b; }
       if(op == mul) { return a\*b; }
   }

   c = exec(add, a,b);
Should be the compiler be prevented from inlining exec and constant-propagating op and removing the mul branch? What about if a and b are constants and the addition itself is optimized away?
4gotunameagain 4 hours ago | parent | prev [-]

This is trickier than it initially seems. Using preprocessor directives to include or exclude swaths of code is a very common thing, and implementing a compiler error as you described would break the building of countless C codebases.