> practical C programmers don't read the document and since this UB was a "ghost" they weren't tripped by it

I would strongly suspect that C compiler implementers very much do read the document, though. Which, as far as I can see, means "ghosts" could easily become actual UB (and worse, sneaky UB that you wouldn't expect.)

▲

tialaramex 4 days ago | parent | next [-]

The previous language might cause a C compiler developer to get very confused because it seems as though they can choose something else but what it is isn't specified, but almost invariably eventually they'll realise oh, it's just badly worded and didn't mean "should" there.

It's like one of those tricky self-referential parlor box statements. "The statement on this box is not true"? Thanks I guess. But that's a game, the puzzles are supposed to be like that, whereas the mission of the ISO document was not to confuse people, so it's good that it is being improved.

▲

uecker 4 days ago | parent [-]

Most of the "ghosts" are indeed just cleaning up the wording. But compiler writers historically often used any excuse that the standard is not clear to justify aggressive optimization. This starts with an overreaching interpretation of UB itself, to wacky concepts such as time-travel, wobbly numbers, incorrect implementation of aliasing (e.g. still in clang), and pointer-to-integer round trips.

▲

tialaramex 4 days ago | parent [-]

I'm sure the compiler authors will disagree that they were "using any excuse". From their point of view they were merely making transformations between equivalent programs, and so any mistake is either that these are not in fact equivalent programs because they screwed up - which is certainly sometimes the case - or the standard should not have said they were equivalent but it did.

One huge thing they have on their side is that their implementation is concrete. Whatever it is that, say, GCC does is de facto actually a thing a compiler can do. The standards bodies (and WG21 has been worse by some margin, but they're both guilty) may standardize anything, but concretely the compiler can only implement some things. "Just do X" where X isn't practical works fine on paper but is not implementable. This was the fate of the Consume ordering. Consume/ Release works fine on paper, you "just" need to have whole program analysis to implement it. Well of course that's not practical so it's not implemented.

▲

uecker 4 days ago | parent [-]

They sometimes screwed up, sometimes just because of bugs, or because different optimization passes had different assumptions that are inconsistent. This somehow contradicts your second point. Compiler have something things implemented which may be concrete on some sense (because it is in a compiler), but still not really a "thing" because it is a mess nobody can formalize using a coherent set of rules.

But then, they also sometimes misread the standard in ways I can't really understand. This often can be seen when the "interpretation" changes over time. Earlier compilers (or even earlier parts of the same compiler) implement the standard as written, some new optimization pass has some creative interpretation.

▲

tialaramex 4 days ago | parent [-]

Certainly compiler developers are only human, and many of them write C++ so they're humans working with a terrible programming language, I wouldn't sign up for that either (I have written small contributions to compilers, but not in C++). I still don't see "any excuses". I see more usual human laziness and incompetence, LLVM for example IMNSHO doesn't work hard enough to ensure their IR has coherent semantics and to deliver on those semantics.

The compiler bug I'm most closely following, and which I suspect you have your eye on too is: https://gcc.gnu.org/bugzilla/show_bug.cgi?id=119472 aka https://github.com/rust-lang/rust/issues/107975 https://github.com/llvm/llvm-project/issues/45725

But it seems like it's just that everybody fucked this up in similar ways, that's two different major compiler backends! I wouldn't be surprised if Microsoft (whose code we can't see) find that they don't get this quite right either.

▲

uecker 4 days ago | parent [-]

I do not imply bad intentions, but I see the arguments brought forward in WG14. It got better in recent years but we had to push back against some rather absurd interpretations of the standard, e.g. that unspecified values can change after they are selected. Your example shows something else, the standard is simply not seemed to be very important. The standard is perfectly clear how pointer comparison works, and yet this is not alone reason enough to invest resources into fixing this if this is not shown to cause actual problems in real code.

▲

tialaramex 3 days ago | parent [-]

> [... absurd interpretations] unspecified values can change after they are selected

It seems hard to not have this without imputing a freeze semantic which would be expensive on today's systems. Maybe I don't understand what you mean ? Rust considered insisting on the freeze semantic and was brought up short by the cost as I understand it.

▲

uecker 3 days ago | parent [-]

I do not see how this adds a substantial cost and it is required for C programs to work correctly and the C standard carefully describes the exact situation where unspecified values are chosen - so the idea that the compiler is then free to break this is clearly in contradiction to the wording. Clang got this wrong and I assume mostly fixed it, because non-frozen values caused a lot inconsistency and other bugs.

▲

tialaramex 3 days ago | parent [-]

Like I said, maybe I'm not understanding which "unspecified values" we're talking about. The freeze semantic is a problem when we've said only that we don't know what value is present (typically one or more mapped but unwritten bytes) and so since we never wrote to this RAM the underpinning machine feels confident to just change what is there. Which means saying "No" isn't on the compiler per se. The OS and machine (if virtual) might be changing this anyway. If you know of Facebook's magic zero string terminator bug, that's the sort of weird symptom you get.

But maybe you're talking about something else entirely?

▲

uecker 3 days ago | parent [-]

No, but jemalloc uses a kernel API that has the behavior and IMHO is is then non-conforming (when using this API, which I think is configurable). The Facebook bug should be taken as a clear sign that this behavior is a terrible idea and not something to be even blessed by modifying the standard. When the original kernel API was introduced, it was already pointed out that the behavior is not ideal. There is no fundamental reason (including performance reasons) this has to behave in this way. It is just bad engineering.

▲

tialaramex 3 days ago | parent [-]

But far from "The compiler shouldn't allow this" what we're talking about here is platform behaviour. My impression is that virtual machines often just do this, so it may be that even your OS has no idea either.

	▲	uecker 3 days ago \| parent [-]
		Virtual machines do not change memory behind your back without your permission.. The issue with jemalloc is very specific problem with a specific Linux API, i.e. MADV_FREE that has the problematic behavior, i.e. it reallocates pages when written-to but not already when accessed. When using this API, jemalloc is not conforming implementation of malloc. We can not weaken semantics of language semantics everytime someone implements something broken. Why MADV_FREE behaves like this is unclear to me, it was criticized the moment it was introduced into the kernel. But the main problem is using it for a memory allocator in C.

▲

Sharlin 4 days ago | parent | prev [-]

If I understand correctly, the "ghosts" are vacuously UB. As in, the standard specifies that if X, then UB, but X can in fact never be true according to the standard.