Remix.run Logo
uecker 4 days ago

They sometimes screwed up, sometimes just because of bugs, or because different optimization passes had different assumptions that are inconsistent. This somehow contradicts your second point. Compiler have something things implemented which may be concrete on some sense (because it is in a compiler), but still not really a "thing" because it is a mess nobody can formalize using a coherent set of rules.

But then, they also sometimes misread the standard in ways I can't really understand. This often can be seen when the "interpretation" changes over time. Earlier compilers (or even earlier parts of the same compiler) implement the standard as written, some new optimization pass has some creative interpretation.

tialaramex 4 days ago | parent [-]

Certainly compiler developers are only human, and many of them write C++ so they're humans working with a terrible programming language, I wouldn't sign up for that either (I have written small contributions to compilers, but not in C++). I still don't see "any excuses". I see more usual human laziness and incompetence, LLVM for example IMNSHO doesn't work hard enough to ensure their IR has coherent semantics and to deliver on those semantics.

The compiler bug I'm most closely following, and which I suspect you have your eye on too is: https://gcc.gnu.org/bugzilla/show_bug.cgi?id=119472 aka https://github.com/rust-lang/rust/issues/107975 https://github.com/llvm/llvm-project/issues/45725

But it seems like it's just that everybody fucked this up in similar ways, that's two different major compiler backends! I wouldn't be surprised if Microsoft (whose code we can't see) find that they don't get this quite right either.

uecker 4 days ago | parent [-]

I do not imply bad intentions, but I see the arguments brought forward in WG14. It got better in recent years but we had to push back against some rather absurd interpretations of the standard, e.g. that unspecified values can change after they are selected. Your example shows something else, the standard is simply not seemed to be very important. The standard is perfectly clear how pointer comparison works, and yet this is not alone reason enough to invest resources into fixing this if this is not shown to cause actual problems in real code.

tialaramex 3 days ago | parent [-]

> [... absurd interpretations] unspecified values can change after they are selected

It seems hard to not have this without imputing a freeze semantic which would be expensive on today's systems. Maybe I don't understand what you mean ? Rust considered insisting on the freeze semantic and was brought up short by the cost as I understand it.

uecker 3 days ago | parent [-]

I do not see how this adds a substantial cost and it is required for C programs to work correctly and the C standard carefully describes the exact situation where unspecified values are chosen - so the idea that the compiler is then free to break this is clearly in contradiction to the wording. Clang got this wrong and I assume mostly fixed it, because non-frozen values caused a lot inconsistency and other bugs.

tialaramex 3 days ago | parent [-]

Like I said, maybe I'm not understanding which "unspecified values" we're talking about. The freeze semantic is a problem when we've said only that we don't know what value is present (typically one or more mapped but unwritten bytes) and so since we never wrote to this RAM the underpinning machine feels confident to just change what is there. Which means saying "No" isn't on the compiler per se. The OS and machine (if virtual) might be changing this anyway. If you know of Facebook's magic zero string terminator bug, that's the sort of weird symptom you get.

But maybe you're talking about something else entirely?

uecker 3 days ago | parent [-]

No, but jemalloc uses a kernel API that has the behavior and IMHO is is then non-conforming (when using this API, which I think is configurable). The Facebook bug should be taken as a clear sign that this behavior is a terrible idea and not something to be even blessed by modifying the standard. When the original kernel API was introduced, it was already pointed out that the behavior is not ideal. There is no fundamental reason (including performance reasons) this has to behave in this way. It is just bad engineering.

tialaramex 3 days ago | parent [-]

But far from "The compiler shouldn't allow this" what we're talking about here is platform behaviour. My impression is that virtual machines often just do this, so it may be that even your OS has no idea either.

uecker 3 days ago | parent [-]

Virtual machines do not change memory behind your back without your permission.. The issue with jemalloc is very specific problem with a specific Linux API, i.e. MADV_FREE that has the problematic behavior, i.e. it reallocates pages when written-to but not already when accessed. When using this API, jemalloc is not conforming implementation of malloc. We can not weaken semantics of language semantics everytime someone implements something broken. Why MADV_FREE behaves like this is unclear to me, it was criticized the moment it was introduced into the kernel. But the main problem is using it for a memory allocator in C.