Remix.run Logo
wizzwizz4 5 hours ago

From the article:

> There's a common rebuttal to this, and I hear it constantly. "Just wait," people say. "In a few months, in a year, the models will be better. They won't hallucinate. They won't fake plots. The problems you're describing are temporary." I've been hearing "just wait" since 2023.

We're not trending towards superintelligence with these AIs. We're trending towards (and, in fact, have already reached) superintelligence with computers in general, but LLM agents are among the least capable known algorithms for the majority of tasks we get them to do. The problem, as it usually is, is that most people don't have access to the fruits of obscure research projects.

Untrained children write better code than the most sophisticated LLMs, without even noticing they're doing anything special.

jnovek 4 hours ago | parent [-]

The rate of hallucination has gone down drastically since 2023. As LLM coding tools continue to pare that rate down, eventually we’ll hit a point where it is comparable to the rate we naturally introduce bugs as humans programmers.

wizzwizz4 3 hours ago | parent [-]

LLMs are still making fundamentally the same kinds of errors that they made in 2021. If you check my HN comment history, you'll see I predicted these errors, just from skimming the relevant academic papers (which is to say they're obvious: I'm far from the only person saying this). There is no theoretical reason we should expect them to go away, unless the model architectures fundamentally change (and no, GPT -> LLaMA is not a fundamental change), because they're not removable discontinuities: they're indicative of fundamental capability gaps.

I don't care how many terms you add to your Taylor series: your polynomial approximation of a sine wave is never going to be suitable for additive speech synthesis. Likewise, I don't care how good your predictive-text transformer model gets at instrumental NLP subtasks: it will never be a good programmer (except as far as it's a plagiarist). Just look at the Claude Code source code: if anyone's an expert in agentic AI development, it's the Claude people, and yet the codebase is utterly unmaintainable dogshit that shouldn't work and, on further inspection, doesn't work.

That's not to say that no computer program can write computer programs, but this computer program is well into the realm of diminishing returns.