Remix.run Logo
yukinon 5 days ago

For someone like me that is less versed in these things, could you explain why bootstrapping a language is a required check for taking a language seriously? My criteria is far less stringent (is it stable? is it popular enough? is the toolchain mature? etc..), so I wonder what I am missing here.

tennysont 5 days ago | parent | next [-]

The Haskell compiler creates a slightly different output every time you compile a program[1]. This makes it difficult to ensure that the binary that is free-to-download downloaded is actually malware free. If it were easy to check, then you could rest easy, assuming that someone out there is doing the check for you (and it would be big news if malware was found).

If you're a hardened security person, then the conversations continues, and the term "bootstrap" becomes relevant.

Since you do not trust compiled binaries, then you can compile programs yourself from the source code (where malware would be noticed). However, in order to compile the Haskell compiler, you must have access to a (recent) version of the Haskell compiler. So, version 10 of the compiler was built using version 9, which was built using version 8, etc. "Bootstrapping" refers (basically) to building version 1. Currently, version 1 was built approximately with smart people, duct tape, and magic. There is no way to build version 1, you must simple download it.

So if you have high security requirements, then you might fear that years ago, someone slipped malware into the Haskell compiler version 1 which will "self replicate" itself into every compiler that it builds.

Until a few years ago, this was a bit of a silly concern (most software wasn't reproducible) but with the rise of Nix and Guix, we've gotten a lot closer to reproducible-everything, and so Haskell is the odd-one-out.

[1] The term is "deterministic builds" or "reproducible builds". Progress is being made to fix this in Haskell.

romes 5 days ago | parent | next [-]

From 9.12, -fobject-determinism[1] will guarantee deterministic objects.

If it ever doesn't, do open a bug report[2]

[1] https://downloads.haskell.org/ghc/latest/docs/users_guide/us... [2] https://gitlab.haskell.org/ghc/ghc/-/issues

lrvick 5 days ago | parent [-]

Good to know! Half the battle covered then.

lrvick 5 days ago | parent | prev [-]

Unlike Nix and Guix, Stagex goes much further in that it has a 100% mandate on supply chain integrity. It trusts no single maintainer or computer and disallows any binary blobs. It is thus not possible to package any software that cannot be bootstrapped, reproduced, and signed by at least two maintainers.

Haskell and Ada are the only languages not possible for us to support, or any software built with them.

Everything else is just fine though.

I do hope both languages address this though, as it is blocking a lot of important open source software like pandoc or coreboot from being used in security critical environments.

frumplestlatz 5 days ago | parent [-]

How are you bootstrapping a modern C compiler without an existing C/C++ compiler and linker?

lrvick 5 days ago | parent | next [-]

From 180 bytes of human readable machine code all the way up.

https://codeberg.org/stagex/stagex/src/branch/main/packages/...

degamad 5 days ago | parent | prev [-]

In assembly, like stage0 does: https://github.com/oriansj/stage0

lrvick 5 days ago | parent [-]

Technically it is raw x86 machine code in hexadecimal, a scheme called "hex0"

Koffiepoeder 5 days ago | parent | prev [-]

I'm not the OP, but for me their comment sparked an association to the famous Ken Thompson lecture called 'Trusting Trust'. Could be a good starting point.