Remix.run Logo
quotemstr 3 hours ago

The first thing I do when I see a paper that claims transformers fundamentally can't do X or Y is to look at the models under test:

> To evaluate generalizability, we conducted tests of GPT-5 (41), Claude Opus 4.1 (42), and Gemini 2.5 Pro (43) from 2025 September

The problem with empirical negative results on LLMs is that they can't rule out that the alleged deficiencies disappear with increased scale and the right fine-tuning. It's like saying my dog has trouble with subject-verb agreement, so meat brains are "fundamentally limited in their capacity for grammar".

I can accept that current LLMs (even latest generation) might exhibit cognitive gaps similar to those we see in humans with deficient executive function, I can't accept these gaps as evidence of fundamental limits of the transformer architecture. LLMs are universal function approximators. Executive function is a function. Yes, yes, it's well-known that transformers have a circuit complexity limit set by layer count and whatever. The limit disappears once you allow for autoregression. Nobody cares about the limits of AI inside a single forward pass.

I have high confidence that with the right sort of training, executive function gaps in LLM can be addressed. I'm not convinced that the problem is the architecture per se.

vlovich123 7 minutes ago | parent | next [-]

You’re just complaining they can’t prove a negative, which is literally impossible.

“I can accept fairies don’t exist today but that doesn’t mean fairies won’t exist in the future.”

The burden of proof lies in those claiming the transformer is able to do something like this. In fact, given that our brains don’t have anything resembling transformers, they don’t learn anything like we train models, and they have all sorts of integrated memory mechanisms we simply do party tricks around with vector databases, I think it’s safer to err on the side of assuming existing transformers failing in very specific ways that human brains do not generally. Also, we clearly haven’t really seen major architectural changes for transformers for a few years now. Most of it has been RL gains, not structural improvements. So it stands to reason that the deficiencies will remain even if we figure out ways to paper over it on a case by case basis.

derbOac an hour ago | parent | prev [-]

You might be completely correct, although my hunch is this is something that would require a change in architecture rather than increases in scale.

The failure points happen in a fairly simple task (Stroop) with increases in repetition of trials. It's not like the number of colors or color words is increasing, which is the sort of thing I might expect if it had to do with the size of the LLM.

On the other hand who knows. I agree that model scale changes make a lot of things a moving target.

At first I thought this paper was kind of odd, but then I felt like it was maybe possibly onto something important. Intuitively I could see the possibility that whatever is causing this failure in the Stroop task might be related to the tendency of LLMs to be "derailable".