LLM models are a distribution. Unlike a python script or turning machine, a LLM model is capable of generating any series of tokens. Developers need stop reasoning about LLM agents as deterministic and to start to think about agents in terms of Monte Carlo and Las Vegas algorithms. It isn't enough to have an agents, it also requires a cheap verifier.

If I was a Ph.D. student today, I'd probably do a thesis on cheap verifiers for LLM agents. Since LLM agents are not reliable and therefore not very useful without it, that is a trillion dollar problem.

Once a developer groks that concept, the agents stop being scary and the potential is large.

▲

aleph_minus_one a day ago | parent | next [-]

> If I was a Ph.D. student today, I'd probably do a thesis on cheap verifiers for LLM agents. Since LLM agents are not reliable and therefore not very useful without it, that is a trillion dollar problem.

PhD thesis are for (ideally) setting up a new world standard in some research area (at the end, you build your PhD thesis out of the deep emotional shards of this completely destroyed life dream), and not for some personal self-discovery project of which you hope that it will turn you into the popular kid on the block.

▲

dataviz1000 a day ago | parent [-]

That is like telling students to never do a PhD thesis on superscalar out-of-order execution, stochastic gradient descent, or UDP. I'm framing it as an analogous problem. What is missing is a cheap verification process.

▲

aleph_minus_one a day ago | parent [-]

> That is like telling students to never do a PhD thesis on superscalar out-of-order execution, stochastic gradient descent, or UDP.

No decent PhD advisor would let their PhD student base their PhD thesis on such well-known concepts: a doctoral study programme is a journey into something never-seen-before (with a very high likelihood of faling and shattering your life). Anything else is failure.

(Obvious exception: either he or the PhD student can convince the other one that there could be something really, really deep in, say, "superscalar out-of-order execution", "stochastic gradient descent" or UDP be found that generations of researchers overlooked, and which once discovered might necessitate rewriting all the standard textbooks about this topic).

	▲	a day ago \| parent [-]
		[deleted]

▲

throwaway27448 a day ago | parent | prev | next [-]

What would a verifier even look like without having all of the same problems that the chatbot itself does? Are humans themselves not the cheap verifiers?

▲

xdavidliu a day ago | parent [-]

humans are probably the least cheap thing you can have in this context

	▲	throwaway27448 a day ago \| parent [-]
		Yea, but they'll do the job. What else plausibly could? ...an LLM? Then you're back at unreliable computation.

▲

drBonkers a day ago | parent | prev | next [-]

Do you have any readings you recommend to start thinking in terms of non-deterministic algorithms and cheap verifiers?

▲

f1shy a day ago | parent | next [-]

Neurosymbolic programming

	▲	whatever120 a day ago \| parent \| next [-]
		That’s not a particular reading
	▲	a day ago \| parent \| prev [-]
		[deleted]

▲

mistrial9 a day ago | parent | prev [-]

filters

▲

add-sub-mul-div a day ago | parent | prev | next [-]

If you told a programmer 30 years ago that someday we'd switch from a deterministic to nondeterministic paradigm for programming computers, they'd ask if we'd put lead back in the drinking water.

▲

munk-a a day ago | parent | next [-]

We'd just explain that management told us we had to and then they'd understand.

▲

dg247 a day ago | parent | prev | next [-]

Been doing this 30 years now. I am asking that question. Everyone talks around it.

	▲	52-6F-62 a day ago \| parent [-]
		You aren't alone. Not even a few years ago if you introduced a component to a system that would result in non-deterministic output... Hell, a single function... You would be named and shamed for it because it went against every principle you should be learning as a novice writer of software. I have used the LLM tools, and I see the real-world potential for these things. But how it's all being sold and applied now: it's upside down.

▲

reducesuffering a day ago | parent | prev | next [-]

Right? I get a kick out of programming used to being:

put this exact value inside this exact register at the right concurrent time and all the tedious exactness that C required

into now:

"pretty please can you not do that and fix the bug somewhere a different way"

▲

georgemcbay a day ago | parent | prev | next [-]

> they'd ask if we'd put lead back in the drinking water.

With Lee Zeldin heading the EPA is anyone sure we won't?

	▲	goatlover a day ago \| parent [-]
		Replace fluoride with lead in the water. Blocks out all the negative effects from wind turbines. /s

▲

com2kid a day ago | parent | prev [-]

It has always been non-deterministic but we relied on low level engineers who knew the dark magicks to keep the horrors at bay.

Bit flips in memory are super common. Even CPUs sometimes output the wrong answer for calculations because of random chance. Network errors are common, at scale you'll see data corruption across a LAN often enough that you'll quickly implement application level retries because somehow the network level stuff still lets errors through.

Some memory chips are slightly out of timing spec. This manifests itself as random crashes, maybe one every few weeks. You need really damn good telemetry to even figure out what is going on.

Compilers do indeed have bugs. Native developers working in old hairy code bases will confirm, often with stories of weeks spent debugging what the hell was going on before someone figured out the compiler was outputting incorrect code.

It is just that the randomness has been so rare, or the effects so minor, that it has all been, mostly, an inconvenience. It worries people working in aviation or medical equipment, but otherwise people accept the need for an occasional reboot or they don't worry about a few pixels in a rendered frame being the wrong color.

LLMs are uncertainty amplifiers. Accept a lot of randomness and in return you get a tool that was pure sci-fi bullshit 10 years ago. Hell when reading science fiction now days I am literally going "well we have that now, and that, oh yeah we got that working, and I think I just saw a paper on that last week."

▲

greysphere a day ago | parent | next [-]

With the old way of doing things you could spend energy to reduce errors, and balance that against the entropy of you environment/new features/whatever at a rate appropriate for your problem.

It's not obvious if that's the case with llm based development. Of course you could 'use llms until things get crazy then stop' but that doesn't seem part of the zeitgeist.

	▲	com2kid a day ago \| parent [-]
		> It's not obvious if that's the case with llm based development. Of course you could 'use llms until things get crazy then stop' but that doesn't seem part of the zeitgeist. Harnesses are coming online now that are designed to reduce failure rates and improve code quality. Systems that designate sub-agents that handle specific tasks, that put quality gates in place, that enforce code quality checks. One system I saw (sadly not open source yet) spends ~70% of tokens on review and quality. I'll admit the current business model of Anthropic/OpenAI would be very unfriendly to that way of working. There is going to be some conflict popping up there. Maybe open weight models will save us, maybe not. If Moore's Law had iterated once or twice more we wouldn't be having this conversation. We'd all be running open weight models on our 64GB+ VRAM video cards at home and most of these discussions would be moot. AI company valuations would be a fraction of what they are.

▲

danaris a day ago | parent | prev | next [-]

> It has always been non-deterministic but we relied on low level engineers who knew the dark magicks to keep the horrors at bay.

This is a disingenuous comparison.

First of all, what you're talking about is nondeterminism at the hardware level, subverting the software, which is, on an ideal/theoretical computer, fully deterministic (except in ways that we specifically tell it not to be, through the use of PRNGs or real entropy sources).

Second of all, the frequency with which traditional programs are nondeterministic in this manner is multiple orders of magnitude less than the frequency of nondeterminism in LLMs. (Frankly, I'd put that latter number at 1.)

This is part of a class of bullshit and weaselly replies that I've seen attempting to defend LLMs over the years, where the LLMs' fundamental characteristics are downplayed because whatever they're being compared to occasionally exhibits some similar behavior—regardless of the fact that it's less frequent, more predictable, and more easily mitigated.

▲

com2kid a day ago | parent [-]

> First of all, what you're talking about is nondeterminism at the hardware level, subverting the software, which is, on an ideal/theoretical computer, fully deterministic (except in ways that we specifically tell it not to be, through the use of PRNGs or real entropy sources).

Malloc and free were never deterministic outside of the simplest systems.

The second we accepted OS preemption we gave up deterministic performance.

Good teams freeze their build tools at a specific version because even minor revs of compilers can change behavior.

I've used way too many schema generator tools that I'd describe as "wishfully deterministic".

Heuristics have been used for years in computer science, resulting in surprising behavior. My point is that if we ramp up the rate of WTF we are willing to tolerate, the power of the systems we can build increases drastically.

> Second of all, the frequency with which traditional programs are nondeterministic in this manner is multiple orders of magnitude less than the frequency of nondeterminism in LLMs. (Frankly, I'd put that latter number at 1.)

Building a RAG lookup system that takes in questions from the user, looks up answers in a doc, and returns results, can be built with reliability damn near approaching 99.99%.

I have seen code generation harnesses that also dramatically reduce non-determinism of LLM generated code, but that will continue to be a hard problem.

My phone camera applies non-deterministic optimizations to images I take, and has done so for years now.

GPS is non-deterministic (noisy), we smooth over the issues. GPS routing is also iffy, but again we smooth over the issues.

The question is can useful products be made with a technology. You can shove enough guardrails on an LLM interface to make it useful. That much is clear. I derive massive value from LLMs and other transformer based systems literally everyday. From the modern speech transcription systems, that are damn near magic compare to what we had a few years back, to image recognition, to natural language interfaces to search over company documents.

If we completely discard coding agents, LLMs are still an insanely impactful technology.

Those guardrails add costs, and latency. For some scenarios that is fine, but for others it isn't. Chat bot support agents implemented by the lowest bidder don't have any attempt at guardrails. Better systems are better built.

I agree that current LLMs all suffer from the problem that the control messages are intermixed with data, that is a crappy problem that the industry has known is a bad pattern for literally decades (since the 70s, 80s?). It seems like an intractable flaw in the systems.

But that doesn't make the system unusable any more than the thousand other protocols suffering from the same flaw are unusable.

	▲	dataviz1000 a day ago \| parent [-]
		The single best example is for this discussion is Superscalar out-of-order execution which can't be used in aerospace, medical devices, and industrial control systems, or you need to guarantee that code finishes within a certain time, because technically it isn't deterministic. Neither is stochastic gradient descent which is the cause of the LLM problem. Nor is UDP, the network protocol that powers video calls, live streaming, and online gaming.

▲

a day ago | parent | prev [-]

[deleted]

▲

airstrike a day ago | parent | prev [-]

While you're at it, I'll take a pair of unicorns too if you can find them.