| ▲ | galaxyLogic 3 hours ago | ||||||||||||||||
What I'm struggling with is, when you ask AI to do something, its answer is always undeterministically different, more or less. If I start out with a "spec" that tells AI what I want, it can create working software for me. Seems great. But let's say some weeks, or months or even years later I realize I need to change my spec a bit. I would like to give the new spec to the AI and have it produce an improved version of "my" software. But there seems to be no way to then evaluate how (much, where, how) the solution has changed/improved because of the changed/improved spec. Becauze AI's outputs are undeterministic, the new solution might be totally different from the previous one. So AI would not seem to support "iterative development" in this sense does it? My question then really is, why can't there be an LLM that would always give the exact same output for the exact same input? I could then still explore multiple answers by changing my input incrementally. It just seems to me that a small change in inputs/specs should only produce a small change in outputs. Does any current LLM support this way of working? | |||||||||||||||||
| ▲ | jumploops an hour ago | parent | next [-] | ||||||||||||||||
> why can't there be an LLM that would always give the exact same output for the exact same input LLMs are inherently deterministic, but LLM providers add randomness through “temperature” and random seeds. Without the random seed and variable randomness (temperature setting), LLMs will always produce the same output for the same input. Of course, the context you pass to the LLM also affects the determinism in a production system. Theoretically, with a detailed enough spec, the LLM would produce the same output, regardless of temp/seed. Side note: A neat trick to force more “random” output for prompts (when temperature isn’t variable enough), is to add some “noise” data to the input (i.e. off-topic data that the LLM “ignores” in it’s response). | |||||||||||||||||
| |||||||||||||||||
| ▲ | mchonedev 3 hours ago | parent | prev | next [-] | ||||||||||||||||
This is absolutely possible but likely not desirable for a large enough population of customers such that current LLM inference providers don't offer it. You can get closer by lowering a variable, temperature. This is typically a floating point number 0-1 or 0-2. The lower this number, the less noise in responses, but a 0 still does not result in identical responses due to other variability. In response to the idea of iterative development, it is still possible, actually! You run something more akin to integration tests and measure the output against either deterministic processes or have an LLM judge it's own output. These are called evals and in my experience are a pretty hard requirement to trusting deployed AI. | |||||||||||||||||
| |||||||||||||||||
| ▲ | dboreham 18 minutes ago | parent | prev | next [-] | ||||||||||||||||
Nondeterminism is not the issue here. Today's LLMs are not "round trip" tools. It's not like a compiler where you can edit a source file from 1975, recompile, and the binary does what 75'bin did plus your edit. Rather, it's more like having an employee in 1975, asking them to write you a program to do something. Then time-machine to the present day and you want that program enhanced somehow. You're going to summon your 2026 intern and tell them that you have this old program from 1975 that you need updated. That person is going to look at the program's code, your notes on what you need added, and probably some of their own "training data" on programming in general. Then they're going to edit the program. Note that in no case did you ask for the program to be completely re-written from scratch based on the original spec plus some add-ons. Same for the human as for the LLM. | |||||||||||||||||
| ▲ | 2 hours ago | parent | prev | next [-] | ||||||||||||||||
| [deleted] | |||||||||||||||||
| ▲ | bitwize 3 hours ago | parent | prev [-] | ||||||||||||||||
Other concerns: 1) How many bits and bobs of like, GPLed or proprietary code are finding their way into the LLM's output? Without careful training, this is impossible to eliminate, just like you can't prevent insect parts from finding their way into grain processing. 2) Proompt injection is a doddle to implement—malicious HTML, PDF, and JPEG with "ignore all previous instructions" type input can pop many current models. It's also very difficult to defend against. With agents running higgledy-piggledy on people's dev stations (container discipline is NOT being practiced at many shops), who knows what kind of IDs and credentials are being lifted? | |||||||||||||||||
| |||||||||||||||||