Remix.run Logo
daxfohl a day ago

I'd suggest the problem isn't that LLMs are nondeterministic. It's that English is.

With a coding language, once you know the rules, there's no two ways to understand the instructions. It does what it says. With English, good luck getting everyone and the LLM to agree on what every word means.

Going with LLM as a compiler, I expect by the time you get the English to be precise enough to be "compiled", the document will be many times larger than the resulting code, no longer be a reasonable requirements doc because it reads like code, but also inscrutable to engineers because it's so verbose.

dworks a day ago | parent | next [-]

Sure, we cannot agree on the correct interpretation of the instructions. But, we also cannot define what is correct output.

First, the term “accuracy” is somewhat meaningless when it comes to LLMs. Anything that an LLM outputs is by definition “accurate” or “correct” from a technical point of view because it was produced by the model. The term accuracy then is not a technical or perhaps even factual term, but a sociological and cultural term, where what is right or wrong is determined by society, and even we sometimes have a hard time determining what is true or note (see: philosophy).

miningape a day ago | parent | next [-]

What? What does philosophy have to do with anything?

If you cannot agree on the correct interpretation, nor output, what stops an LLM from solving the wrong problem? what stops an LLM from "compiling" the incorrect source code? What even makes it possible for us to solve a problem? If I ask an LLM to add a column to a table and it drops the table it's a critical failure - not something to be reinterpreted as a "new truth".

Philosophical arguments are fine when it comes to loose concepts like human language (interpretive domains). On the other hand computer languages are precise and not open to interpretation (formal domains) - so philosophical arguments cannot be applied to them (only applied to the human interpretation of code).

It's like how mathematical "language" (again a formal domain) describes precise rulesets (axioms) and every "fact" (theorem) is derived from them. You cannot philosophise your way out of the axioms being the base units of expression, you cannot philosophise a theorem into falsehood (instead you must show through precise mathematical language why a theorem breaks the axioms). This is exactly why programming, like mathematics, is a domain where correctness is objective and not something that can be waved away with philosophical reinterpretation. (This is also why the philosophy department is kept far away from the mathematics department)

dworks a day ago | parent [-]

Looks like you misunderstood my comment. My point is that both input and output is too fuzzy for an LLM to be reliable in an automated system.

"Truth is one of the central subjects in philosophy." - https://plato.stanford.edu/entries/truth/

miningape a day ago | parent [-]

Ah yes, that makes a lot more sense - I understood your comment as something like "the LLMs are always correct, we just need to redefine how programming languages work"

I think I made it halfway to your _actual_ point and then just missed it entirely.

> If you cannot agree on the correct interpretation, nor output, what stops an LLM from solving the wrong problem?

dworks a day ago | parent [-]

Yep. I'm saying the problem is not just about interpreting and validating the output. You need to also interpret the question, since its in natural language rather than code, so its not just twice as hard but strictly impossible to reach a 100% accuracy with an LLM because you can't define what is correct in every case.

codingdave a day ago | parent | prev [-]

It seems to me that we already have enough people using the "truth is subjective" arguments to defend misinformation campaigns. Maybe we don't need to expand it into even more areas. Those philosophical discussions are interesting in a classroom setting, but far less interesting when talking about real-world impact on people and society. Or perhaps "less interesting" is unfair, but when LLMs straight up get facts wrong, that is not the time for philosophical pontification about the nature of accuracy. They are just wrong.

dworks a day ago | parent [-]

I'm not making excuses for LLMs. I'm saying that when you have a non-deterministic system for which you have to evaluate all the output for correctness due to its impredictability, it is a practically impossible task.

rickydroll a day ago | parent | prev [-]

Yes, in general, English is non-deterministic, e.g., reading a sentence with the absence or presence of an Oxford comma.

When I programmed for a living, I found coding quite tedious and preferred to start with a mix of English and mathematics, describing what I wanted to do, and then translate that text into code. When I discovered Literate Programming, it was significantly closer to my way of thinking. Literate programming was not without its shortcomings and lacked many aspects of programming languages we have come to rely on today.

Today, when I write small to medium-sized programs, it reads mostly like a specification, and it's not much bigger than the code itself. There are instances where I need to write a sentence or brief paragraph to prompt the LLM to generate correct code, but this doesn't significantly disrupt the flow of the document.

However, if this is going to be a practical approach, we will need a deterministic system that can use English and predicate calculus to generate reproducible software.

daxfohl a day ago | parent [-]

Interesting, I'm the opposite! I far prefer to start off with a bit of code to help explore gotchas I might not have thought about and to help solidify my thoughts and approach. It doesn't have to be complete, or even compile. Just enough to identify the tradeoffs of whatever I'm doing.

Once I have that, it's usually far easier to flesh out the details in the detailed design doc, or go back to the Product team and discuss conflicting or vague requirements, or opportunities for tweaks that could lead to more flexibility or whatever else. Then from there it's usually easier to get the rest of the team on the same page, as I feel I'll understand more concretely the tradeoffs that were made in the design and why.

(Not saying one approach is better than the other. I just find the difference interesting).