Remix.run Logo
bee_rider 4 hours ago

Are conventional compilers actually deterministic, with all the bells and whistles enabled? PGO seems like it ought to have a random element.

pjmlp 30 minutes ago | parent | next [-]

Not at all, when talking about managed runtimes.

Hence why it is hard to do benchmarks with various kinds of GC and dynamic compilers.

You can't even expect deterministic code generation for the same source code across various compilers.

jcranmer 2 hours ago | parent | prev | next [-]

It is generally considered a bug in a compiler if its output is nondeterministic. Of course, compilers are large, complex beasts, and nondeterminism is so easy to accidentally introduce (e.g., do a "for each" in a map where the key is a pointer), that it's probably not too hard to find cases that have nondeterminism.

> PGO seems like it ought to have a random element.

PGO should be deterministic based on the runs used to generate the profile. The runs are tracking information that should be deterministic--how many times does the the branch get taken versus not taken, etc. HWPGO, which relies on hardware counters to generate profiling information, may be less deterministic because the hardware counters end up having some statistical slip to them.

vlovich123 3 hours ago | parent | prev | next [-]

No, modulo bugs generally the same set of inputs to a compiler are guaranteed to produce the same output bit for bit which is the definition of determinism.

There’s even efforts to guarantee this for many packages on Linux - it’s a core property of security because it lets you validate that the compilation process or environment wasn’t tampered with illicitly by being able to verify by building from scratch.

Now actually managing to fix all inputs and getting deterministic output can be challenging, but that’s less to do with the compiler and more to do with the challenge of completely taking the entire environment (the profile you are using for PGO, isolating paths on the build machine being injected into the binary, programs that have things in their source or build system that’s non deterministic (e.g. incorporating the build time into the binary)

123malware321 3 hours ago | parent | prev | next [-]

well considering you use components like DFA to build compilers, yes they are determenistic. you also have reproducible builds etc.

or does your binary always come out differently each time you compile the same file??

You can try it. try to compile the same file 10 times and diff the resultant binaries.

Now try to prompt a bunch of LLMs 10 times and diff the returned rubbish.

sigbottle 3 hours ago | parent [-]

I think one of the best ways to understand the "nice property" of compilers we like isn't necessarily determinacy, but "programming models".

There's this really good blog post about how autovectorization is not a programming model https://pharr.org/matt/blog/2018/04/18/ispc-origins

The point is that you want to reliably express semantics in the top level language, tool, API etc. because that's the only way you can build a stable mental model on top of that. Needing to worry about if something actually did something under the hood is awful.

Now of course, that depends on the level of granularity YOU want. When writing plain code, even if it's expressively rich in the logic and semantics (e.g. c++ template metaprogramming), sometimes I don't necessarily care about the specific linker and assembly details (but sometimes I do!)

The issue I think is that building a reliable mental model of an LLM is hard. Note that "reliable" is the key word - consistent. Be it consistently good or bad. The frustrating thing is that it can sometimes deliver great value and sometimes brick horribly and we don't have a good idea for the mental model yet.

To constrain said possibility space, we tether to absolute memes (LLMs are fully stupid or LLMs are a superset of humans).

Idk where I'm going with this

candiddevmike 3 hours ago | parent | prev [-]

Yes, they will output the same file hash every time, short of some build time mutation. Thus we can have nice things like reproducible builds and integrity checks.