Remix.run Logo
1718627440 4 days ago

> [0]: https://faultlore.com/blah/c-isnt-a-language/#you-cant-actua...

This blog post is full of misconceptions.

It starts by asserting, that C defines an ABI and then complains that everything is so complicated, because actually C doesn't define it. C is defined in terms of behaviour of the abstract C machine. As far as C is concerned, there is no ABI. C only prescribes meaning to the language, it does not prescribe how you actually implement this; the compiler is free to do as it pleases, including choosing the ABI in some limits.

What defines the ABI is the platform consisting of the machine and the OS. The OS here includes at least the kernel (or some bare metal primitives) and the compiler. And the standard C library really IS part of the compiler. That's why GCC vs Clang or glibc vs muslc always comes with incompatibilities. Because these ARE different OSs. They can choose to do things the same, but this is because of some formal (POSIX, the platform vendor) or some informal (GCC and Clang) standards.

Yes a lot of ABIs are defined with C syntax, but this is, because C has terms for that and isn't too new. You can specify this in the language of your choice and it will describe the same ABI. Yes, int doesn't have a size independent of the platform. But if the specification wouldn't use C as a syntax, it would just write "this argument has the same size as described in the 'Appendix X Definitions' under 'Platform's default integer size'". Writing "int" is just a shorter notation for exactly this.

> You Can’t Actually Parse A C Header

I don't know why the choice to use the compiler to implement parsing a C header is framed as a bad thing. Is relying on write(2) from the kernel a bad thing instead of trying to bypass the kernel? The compiler is what defines the meaning of a header, why don't ask it about the result? If you don't feel like reimplementing the C preprocessor, you can also just parse preprocessed headers. These are self-contained, i.e. don't need knowing the include path. But of course this approach comes with the caveat that when the user updated the C compiler, your knowledge has become outdated or wrong. I don't know why it is framed as weird, that you need a C parser to parse C code. This is the definition of a C parser. You can't just write code that parses C and is somehow not a C parser.

> 176 triples. I was originally going to include them all for the visual gag/impact but it’s literally too many for even that.

No, they are ONLY 176 target triples (this is the LLVM term, other terms are "gnu tuple" or "gnu type") that your tool supports. There is also not the definite list, it's a syntax to describe the major components of a platform. There are decades of vendors improving their platform in incompatible ways, of course the description of this is messy.

See for example: https://news.ycombinator.com/item?id=43698363

And this is the test data for the source of GNU types: https://cgit.git.savannah.gnu.org/cgit/config.git/tree/tests... See that this contains 1180 types, but of course that's also not definite.

> pub type intmax_t = i64;

> A lot of code has completely given up on keeping C in the loop and has started hardcoding the definitions of core types. After all, they’re clearly just part of the platform’s ABI! What are they going to do, change the size of intmax_t!? That’s obviously an ABI-breaking change!

There is a reason it is called intMAX_t! It does not HAVE a definite size, it is the MAXimal size of an integer on that platform. Yes, they are problems nowadays due to ossification, but they come exactly from people like that blog author. When you want your program to have a stable ABI, that doesn't change when your platform supports larger integer types, you just don't use intMAX_t!

> And even then you have the x64 int problem: it’s such a fundamental type, and has been that size for so long, that countless applications may have weird undetectable assumptions about it. This is why int is 32-bit on x64 even though it was “supposed” to be 64-bit: int was 32-bit for so long that it was completely hopeless to update software to the new size even though it was a whole new architecture and target triple!

That is called ossification. When you program C you are not supposed to care about the sizes. When your program does, your program is broken/non-portable. Yes, this limits the compilers, because they don't want programs to be broken. But is really the same as e.g. MS Windows catering to a specific program's bugs. This is not a design mistake of C:

> sometimes you make a mistake so bad that you just don’t get to undo it.

aw1621107 4 days ago | parent [-]

> I don't know why the choice to use the compiler to implement parsing a C header is framed as a bad thing.

Not sure I agree with this interpretation, though maybe I'm focusing on a different part of the article than you are. Where you are getting the negative sense from?

That being said, I don't think it's too hard to imagine why someone might be a bit hesitant to use a C/C++ compiler to parse C/C++ headers - for example, it can be a pretty big dependency to take on, may add friction for devs/users, and integration with your own tool may be awkward and/or an ongoing time sink especially if you're crossing an FFI boundary or if the API you're using isn't stable (as I believe it is the case for LLVM).

> There is a reason it is called intMAX_t! It does not HAVE a definite size, it is the MAXimal size of an integer on that platform.

I think this somewhat misses the point of the bit you quoted. In context, it's basically saying that grabbing "real" C type info for interop is so painful that people will hard-code "reasonable" assumptions instead.

> When you want your program to have a stable ABI, that doesn't change when your platform supports larger integer types, you just don't use intMAX_t!

My impression is that the problem is less intmax_t changing and more that intmax_t can change out of sync. Even if you assume every use of intmax_t in a public API corresponds to an intentional desire for the bit width to evolve over time, you can still run into nasty issues if you can't recompile everything at once (which is a pretty strong constraint on the C/C++ committees if history is any indication).