Remix.run Logo
mpyne 4 hours ago

The whole field of reproducible builds is only a field because compilers also have had trouble historically of producing binary artifacts with guaranteed provenance and binary compatibility even when built from the same source codes.

If I assign a bug fix ticket to a human developer on my team, I won't be able to precisely replicate how they go about solving the bug but for many bugs I can at least be assured that the bug will get solved, and that I understand the basic approach the assigned dev would use to troubleshoot and resolve the ticket.

This is an organizational abstraction but it's an abstraction just the same, leaky as it is.

kibwen 2 hours ago | parent | next [-]

> The whole field of reproducible builds is only a field because compilers also have had trouble

No, this is not comparable. The reason reproducible builds are tricky is not because compilers are inherently prone to randomness, it's because binaries often bake-in things like timestamps and the exact pathnames of the system used to produce the build. People need to stop comparing LLMs to compilers, it's an embarrassingly poor analogy.

mpyne an hour ago | parent [-]

> The reason reproducible builds are tricky is not because compilers are inherently prone to randomness

And neither are LLMs. Having their output employ randomness by default is a choice, not a requirement, just like things like embedding timestamps into builds is a choice that can be unwound if you want the build to be reproducible.

> People need to stop comparing LLMs to compilers, it's an embarrassingly poor analogy.

They are certainly different things, but if you are going to criticize LLMs it would be better if you understood them.

jmuguy an hour ago | parent | next [-]

Are you arguing that the output of an LLM isn’t random?

mpyne 39 minutes ago | parent [-]

It is random if you select it to be (temperature != 0, etc.).

It is not random if you don't use random sampling in its output generation.

It the whole thing were actually stochastic then prompt caching would be impossible because having a cache of what the previous tokens transformed into to speed up future generation would be invalidated by the missing random state.

Look at llama.cpp, you can see what samplers are adjustable and if you use samplers that employ randomness you can see what settings disable the random sampling. Or you can employ randomness but fix the seed to get reproducible results.

an hour ago | parent | prev | next [-]
[deleted]
achierius an hour ago | parent | prev [-]

> Having their output employ randomness by default is a choice, not a requirement

This is not really meaningfully true. E.g. batching, heterogeneous inference HW, and even differences in model versions can make a difference in what result you get, and these are hard to solve.

mpyne 35 minutes ago | parent [-]

But again, these are all things that are also true of build systems.

GCC 16.1 vs. 15.2 will get you differences. GNU ld vs. gold vs. mold vs. lld will get you differences. Whether you do or do not employ LTO or other whole-program optimization vs. whether you do gets you differences.

Have you never debugged a race condition that worked on your machine but didn't work in prod, based only on how things ended up compiled in the final binary?

I'm not saying these are identical but there's a lot more similarity than you all seem to understand. And we've made compilers work well enough that a lot of you believe that they give very routine, very deterministic outputs as part of broader build systems even though nothing could be further from the truth, even today.

z3c0 3 hours ago | parent | prev [-]

It's an abstraction for you, not the rest of that developer's team, who have to reproduce the same solution even after said developer has "won the lottery", so-to-speak.

inb4: "Don't worry, just use GPT to make the docs"

zadikian 11 minutes ago | parent [-]

Well they don't have to win the lottery again because you commit the code, not the prompt. Which is fine. Idk why anyone compares this to a compiler, it doesn't need to be a compiler to be useful.