Remix.run Logo
IshKebab 5 days ago

Funny thing I found when I gave up trying to find documentation and read the LLVM source code (seems to be what happened to the author too!): there are actually five components of the triple, not four.

I can't remember what the fifth one is, but yeah... insane system.

Thanks for writing this up! I wonder if anyone will ever come up with something more sensible.

o11c 5 days ago | parent [-]

There are up to 7 components in a triple, but not all are used at once, the general format is:

  <machine>-<vendor>-<kernel>-<libc?><abi?><fabi?>
But there's also <obj>, see below.

Note that there are both canonical and non-canonical triples in use. Canonical triples are output by `config.guess` or `config.sub`; non-canonical triples are input to `config.sub` and used as prefixes for commands.

The <machine> field (1st) is what you're running on, and on some systems it includes a version number of sorts. Most 64-bit vs 32-bit differences go here, except if the runtime differs from what is natural (commonly "32-bit pointers even though the CPU is in 64-bit mode"), which goes in <abi> instead. Historically, "arm" and "mips" have been a mess here, but that has largely been fixed, in large part as a side-effect of Debian multiarch (whose triples only have to differ from GNU triples in that they canonicalize i[34567]86 to i386, but you should use dpkg-architecture to do the conversion for sanity).

The <vendor> field (2nd) is not very useful these days. It defaults to "unknown" but as of a few years ago "pc" is used instead on x86 (this means that the canonical triple can change, but this hasn't been catastrophic since you should almost always use the non-canonical triple except when pattern-matching, and when pattern-matching you should usually ignore this field anyway).

The <kernel> field (3rd) is pretty obvious when it's called that, but it's often called <os> instead since "linux" is an oddity for regularly having a <libc> component that differs. On many systems it includes version data (again, Linux is the oddity for having a stable syscall API/ABI). One notable exception: if a GNU userland is used on BSD/Solaris system, a "k" is prepended. "none" is often used for freestanding/embedded compilation, but see <obj>.

The <libc> field (main part of the 4th) is usually absent on non-Linux systems, but mandatory for "linux". If it is absent, the dash after the kernel is usually removed, except if there are ABI components. Note that "gnu" can be both a kernel (Hurd) and a libc (glibc). Android uses "android" here, so maybe <libc> is a bit of a misnomer (it's not "bionic") - maybe <userland>?

<abi>, if present, means you aren't doing the historical default for the platform specified by the main fields. Other than "eabi" for ARM, most of this is for "use 32-bit pointers but 64-bit registers".

<fabi> can be "hf" for 32-bit ARM systems that actually support floats in hardware. I don't think I've seen anything else, though I admit the main reason I separately document this from <abi> is because of how Debian's architecture puts it elsewhere.

<obj> is the object file format, usually "aout", "coff", or "elf". It can be appended to the kernel field (but before the kernel version number), or replace it if "none", or it can go in the <abi> field.

IshKebab 5 days ago | parent [-]

Nah I dunno where you're getting your information from but LLVM only supports 5 components.

See the code starting at line 1144 here: https://llvm.org/doxygen/Triple_8cpp_source.html

The components are arch-vendor-os-environment-objectformat.

It's absolutely full of special cases and hacks. Really at this point I think the only sane option is an explicit list of fixed strings. I think Rust does that.

jcranmer 5 days ago | parent | next [-]

You're not really contradicting o11c here; what LLVM calls "environment" is a mixture of what they called libc/abi/fabi. There's also what LLVM calls "subarch" to distinguish between different architectures that may be relevant (e.g., i386 is not the same as i686, although LLVM doesn't record this difference since it's generally less interested in targeting old hardware), and there's also OS version numbers that may or may not be relevant.

The underlying problem with target triples is that architecture-vendor-system isn't sufficient to uniquely describe the relevant details for specifying a toolchain, so the necessary extra information has been somewhat haphazardly added to the format. On top of that, since the relevance of some of the information is questionable for some tasks (especially the vendor field), different projects have chosen not to care about subtle differences, so the normalization of a triple is different between different projects.

LLVM's definition is not more or less correct than gcc's here, nor are these the only definitions floating around.

o11c 5 days ago | parent [-]

Hm, looking to see if the vendor field is actually meaningful ... I see some stuff for m68k and mips and sysv targets ... some of it working around pre-standard vendor C implementations

Ah, I found a modern one:

  i[3456789]86-w64-mingw* does not use winsup
  i[3456789]86-*-mingw* with other vendors does use winsup
There are probably more; this is embedded in all sorts of random configure scripts and it is very not-greppable.
o11c 5 days ago | parent | prev [-]

LLVM didn't invent the scheme; why should we pay attention to their copy and not look at the original?

The GNU Config project is the original.

IshKebab 5 days ago | parent [-]

The article goes into this a bit. But basically because LLVM is extremely popular and used as a backend by lots of other languages, e.g. Rust.

Frankly being the originators of this deranged scheme is a good reason not to listen to GNU!