GRAM is another one of those "stupid specific architectures" - same as HRMs, etc. It can sort of contest LLMs at specific puzzles. It demonstrated that much. It's not a general contender with LLMs at LLM tasks.

If you subscribe to things like "there are tasks LLMs are innately bad at due to insufficient depth and lack of recurrent capability", then GRAM might be another signal towards that.

But keep in mind: even ARC-AGIs have their frontiers dominated by LLMs. Even if "innately bad" is true, it clearly doesn't go all the way to "innately incapable".

▲

onlyrealcuzzo 16 hours ago | parent [-]

A 10m param GRAM model beat o3-mini - a model 2000x its size - on Arc AGI...

▲

ACCount37 15 hours ago | parent [-]

And then that 10M param GRAM went and got its shit kicked in by Grok 4.20 Blaze It Edition - on the same ARC-AGI battery. I know how that story goes.

It's the pattern with those "stupid specific architectures". Very good at this one thing. But only ever "good for their size", and only to a point.

They don't scale up and they don't generalize. Go far enough on task complexity and LLMs just kill them.

Does that make them useless? As an LLM replacement, yes. In general? Maybe not, I can think of things. But I'm yet to find any paper demonstrating a real world use.

▲

onlyrealcuzzo 14 hours ago | parent [-]

GRAM is something you add onto an LLM... It's not an LLM replacement. It's like an MLA caching layer, an MoE routing layer, or a speculative decoder at the end...

▲

yorwba 7 hours ago | parent [-]

You could certainly bolt GRAM onto an LLM, but that won't magically improve its reasoning.

It's a special-purpose design for constraint-satisfaction problems with simple rules, but complex interactions. E.g. when solving a Sudoku, the set of valid choices at every step is easy to determine, but you could make a series of valid choices that back you into a corner where no more progress is possible and you have to backtrack.

Meanwhile, LLM reasoning failures are more often of the kind where a choice is clearly invalid (as judged by a human observer), but the LLM picks it anyway, because the underlying rule is complex and context-dependent and the model only learned an imperfect approximation that often breaks down.

GRAM won't help with that.

	▲	ACCount37 2 hours ago \| parent [-]
		My vision for what might happen: an LLM emits a "neural constraint satisfaction task" in latent space, kicks a "neural tool call" into a non-LLM architecture, runs that architecture, gets a latent answer back, attends to the answer to generate better text answers for problems that benefit from improved constraint-satisfaction. But that's a very hard thing to implement, and the gains are uncertain. Thus "might".