Remix.run Logo
dmitrygr 4 hours ago

I stoped reading about here:

    > bool parse_packet(const uint8_t* bytes) {
    >   const int* magic_intp = (const int*)bytes;   // UB!
Author, if you are reading this, please cite the spec section explaining that this is UB. Dereferencing the produced pointer may be UB, but casting itself is not, since uint8_t is ~ char and char* can be cast to and from any type.

you might try to argue that uint8_t is not necessarily char, and while it is true that implementations of C can exist where CHAR_BIT > 8, but those do not have uint8_t defined (as per spec), so if you have uint8_t, then it is "unsigned char", which makes this cast perfectly safe and defined as far as i can tell. Of course CHAR_BIT is required to be >= 8, so if it is not >8, it is exactly 8. (In any case, whether uint8_t is literally a typedef of unsigned char is implementation-defined and not actually relevant to whether the cast itself is valid -- it is)

raphlinus 4 hours ago | parent | next [-]

The issue is not type punning (itself a very common source of UB), but the fact that the `bytes` pointer might not be int-aligned. The spec is clear that the creation (not just the dereferencing) of an unaligned pointer is UB, see 6.3.2.3 paragraph 7 of the C11 (draft) spec.

Of course, this exchange just demonstrates the larger point, that even a world-class expert in low level programming can easily make mistakes in spotting potential UB.

flohofwoe 3 hours ago | parent | next [-]

> Of course, this exchange just demonstrates the larger point, that even a world-class expert in low level programming can easily make mistakes in spotting potential UB.

A "world-class expert in low level programming" knows that unaligned memory accesses are no problem anymore on most modern CPUs, and that this particular UB in the C standard is bogus and needs to fixed ;)

formerly_proven 3 hours ago | parent [-]

… it’s only UB if the pointer is actually misaligned. It’s not possible to tell from these two lines whether that’s the case.

gritzko 4 hours ago | parent | prev | next [-]

C of course is ancient. It remembers the Cambrian explosion of CPU architectures, twelve-bit bytes and everything like that. I wonder if it is possible to codify some pragmatic subset of it that works nicely on currently available CPUs. Cause the author of the piece goes back in time to prove his point (SPARCs and Alphas).

dmitrygr 4 hours ago | parent [-]

Fun story: even the latest C spec doesn’t require CHAR_BIT == 8, but it does now codify 2s complement int representation. (IIRC)

eru 3 hours ago | parent [-]

For unsigned ints, or also for signed ints?

account42 an hour ago | parent | next [-]

Two's complement is a representation specifically for signed integers.

dmitrygr 3 hours ago | parent | prev [-]

For signed. Unsigned overflow was defined for a while now.

gblargg an hour ago | parent [-]

And unsigned negation is two's complement negation as well (-u = 0-u).

3 hours ago | parent | prev | next [-]
[deleted]
dmitrygr 4 hours ago | parent | prev [-]

That cast is valid. Spec does not guarantee same bit sequence for resulting pointer and source pointer. But as the cast is explicitly allowed, it is not UB. Compiler is free to round the pointer down. Or up. Or even sideways. All ok. Dereferencing it — indeed not ok. But the cast is explicitly allowed and not UB.

Pointer casts changing pointer bit sequences is common on weird platforms (eg: some TI DSPs, PIC, and aarch64+PAC). And it is valid as per spec. Pointer assignment is not required to be the same as memcpy-ing the pointer unto a pointer to another type.

You misunderstood the spec. No promises are made that that cast copies the pointer bit for bit (and thus creates an invalid pointer). Therefore, your objection to invalid pointers is null and void. :)

raphlinus 4 hours ago | parent [-]

I'm not assuming anything about bit representations. In this case, the spec language is quite clear and unambiguous.

6.3.2.3 paragraph 7: A pointer to an object type may be converted to a pointer to a different object type. If the resulting pointer is not correctly aligned[footnote 68]) for the referenced type, the behavior is undefined. Otherwise, when converted back again, the result shall compare equal to the original pointer. When a pointer to an object is converted to a pointer to a character type, the result points to the lowest addressed byte of the object. Successive increments of the result, up to the size of the object, yield pointers to the remaining bytes of the object.

This is a subsection of section 6.3 which describes conversions, which include both implicit and conversions from a cast operation. This language is not saying anything about bit representations or derefencing.

I happen to be wearing my undefined behavior shirt at the moment, which lends me an extra layer of authority. I'm at RustWeek in Utrecht, and it's one of my favorite shirts to wear at Rust conferences. But let's say for the sake of argument that you are right and I am indeed misunderstanding the spec. Then the logical conclusion is that it's very difficult for even experienced programmers to agree on basic interpretations of what is and what isn't UB in C.

dmitrygr 3 hours ago | parent [-]

I do not see there a promise that the cast will produce an invalid pointer, nor anything prohibiting the compiler from rounding the pointer down, thus producing a valid one. “Converted” does not require bit copy. I don’t see how this interpretation is against any section of the spec.

dwattttt 2 hours ago | parent | next [-]

I also do not see any requirement in the quoted text that the casted pointer be dereferenced before noting "the behavior is undefined".

In practice performing a cast doesn't really do much until you dereference, but without a carve out in the spec, it does really mean "the behavior is undefined".

3 hours ago | parent | prev | next [-]
[deleted]
pjc50 2 hours ago | parent | prev | next [-]

> rounding the pointer down, thus producing a valid one

A "valid" pointer to the wrong object?

cyclopeanutopia 3 hours ago | parent | prev [-]

> Otherwise, when converted back again, the result shall compare equal to the original pointer.

Doesn't this part exclude the possibility of rounding down?

thomashabets2 an hour ago | parent | prev | next [-]

Author here.

> A pointer to an object type may be converted to a pointer to a different object type. If the resulting pointer is not correctly aligned71) for the referenced type, the behavior is undefined.

C23 6.3.2.3p7.

stevenhuang 4 hours ago | parent | prev [-]

Byte and int has different alignment requirements. It is UB the moment you make such a ptr.

Great way to demonstrate the point of the article.

gritzko 3 hours ago | parent | next [-]

That better be marked "historical". At least, Lemire says:

On recent Intel and 64-bit ARM processors, data alignment does not make processing a lot faster. It is a micro-optimization. Data alignment for speed is a myth. // https://lemire.me/blog/2012/05/31/data-alignment-for-speed-m...

(while in the olden days, a program may crash on unaligned access, esp on RISC)

eru 3 hours ago | parent [-]

Don't mix up what processors do with what the C standard allows you to get away with.

flohofwoe 3 hours ago | parent [-]

...and don't mix up the C standard with what actually existing compilers allow you to get away with ;) In the end the standard is merely a set of guidelines. What matters is how compiler toolchains behave in the real word, and breaking code which does unaligned memory accesses by 'UB exploitation' would be quite insane.

dmitrygr 3 hours ago | parent | prev [-]

Without memcpy there is no guarantee that that line produces an invalid pointer

I don’t see what spec part would prohibit that cast from validly compiling to

   BIC r3, r0, #3
Spec only guaranteed round-trip through char* of properly aligned for type pointers. This doesn’t break that.