Remix.run Logo
trumpdong 3 hours ago

There's lots of UB in C-family execution models. Some of which is not actually UB because the implementation defines it - e.g. aligned DWORD-sized memory access is atomic on Windows because Microsoft said it is.

By choosing to use this language you choose to navigate the UB. Otherwise you'd be writing in Go, or Python.

It is possible to write reliable code despite the presence of UB in a language just like it's possible to drive to work every day for 20 years despite most of the directions you can point the car leading to an immediate crash. That's a needle with a much thinner eye than UB in C, and most people manage it. Mainly it means being very careful about lifetime and ownership. The Linux kernel manages it 99% of the time simply by being careful about lifetime and ownership, and that's a project with a huge number of contributors who don't intimately know each other's modules. I'm the Linux kernel you can't just say "new whatever" - you must have a plan for a lifetime of that whatever, and other people will review it.

I agree with you about std::span.

pjc50 an hour ago | parent | next [-]

> Some of which is not actually UB because the implementation defines it

No - if something is UB in the spec, it's UB. The implementation will do something, sure, but what it does is not fixed and may even change based on compiler version and optimization level.

> DWORD-sized memory access is atomic on Windows because Microsoft said it is

Well, Intel said it is. Mind you I don't think there are any 32-bit native architectures where aligned dword access isn't atomic. Unaligned, on the other hand ...

trumpdong 22 minutes ago | parent | next [-]

"Undefined behavior" in the C standard literally means "behavior which this C standard does not put any requirements on" - it says so in the definitions section of the C standard. Other things can still put requirements on it. MSVC isn't just a C++ compiler - it's a C++ compiler for x64 Windows and therefore follows the rules of C++, x64, and Windows all at once.

simiones an hour ago | parent | prev [-]

> No - if something is UB in the spec, it's UB.

A compiler is still free to ignore the spec and declare that something is not UB. However, this is very much compiler based, not platform based. Windows might guarantee that aligned DWORD-sized memory accesses are atomic, but that doesn't mean Clang when compiling for Windows would respect this - but MSVC might.

arcticbull 2 hours ago | parent | prev | next [-]

Yeah but also, quick question:

  struct S {
      char c;
      int i;
  };

  struct S a = {0};
  struct S b = {0};

  memcmp(&a, &b, sizeof(a)) == ...
If you answered 0, you'd be wrong, the answer is undefined, thanks to padding, initialization and alignment rules. Padding bytes are undefined, and not guaranteed to be initialized to zero even if the variable is declared static (where the members would be zeroed).

This is why the compiler is angry at the post writer, and why the reinterpret_cast is needed. Ideally if they wanted to do something with the data, they'd unbox the structure.

That's why it's not a good idea to use void* to pass arbitrary data interchangeable with bytes. It's a location, it makes no representation as to what's there and how to interact with it. Let alone who owns it.

std::span solves two problems here. One is the ownership problem. The other is that span<T> is a T[]. void* is god only knows.

The post asserts:

> The code is very clear and straightforward: you pass a pointer to the custom data structure, and its size in bytes. That’s it. Simple and clear.

This is unfortunately entirely false in C thanks to the aforementioned alignment/padding UB (and of course inner pointers). This is addressed with std::span. You'd still have to reinterpret_cast your structure to get the UB.

> Why should people complexify and uglify their C++ code with the uint8_t pointer (or std::byte), when void* works just fine??

tl;dr: because it doesn't. It just kinda looks like it does if you squint, and it's going to lead to the gnarliest bugs in the world.

saagarjha 39 minutes ago | parent | next [-]

Padding bytes are initialized to zero if you zero initialize the aggregate. It is hard to keep those bytes as zero but at initialization this much is guaranteed.

porridgeraisin an hour ago | parent | prev [-]

> even if the variable is declared static

No, for static even padding bytes are zero.

For automatic, yes it may effectively turn a = {} to a.member = 0, leaving the padding bytes uninitialised. Or on copies like a = b it may not copy padding bytes.

repelsteeltje 2 hours ago | parent | prev [-]

There is a difference between UB in C, and something being undefined in some version of Microsoft C on Windows.

Many of C's UB is specifically, intentionally left undefined in the standard to express code that relies on some specific way it is handled, is not proper, portable C. Indeed, the DWORD-sized memory access being atomic doesn't apply to MS Windows prior to version 3.0 running on a 80286.

It's UB because the ISO C spec says it's UB.