Nice!

Briefly, the idea is recursively to decompose tasks into the simplest possible steps, recursively call (relatively small) LLMs as agents to execute one step at a time, and using a clever voting scheme to choose how to execute each step. The authors use this technique to get a relatively small LLM to solve Towers of Hanoi with 20 rings (1M steps). All of it using natural language.

The most obvious question is whether other tasks, more interesting -- less "rote" -- than Towers of Hanoi, can similarly be recursively decomposed into simple steps. I'm not sure that's always possible.

▲

wordpad 4 hours ago | parent | next [-]

This works because a problem could be broken down to a prompt which rarely hallucinates.

Most real world prompts can't be reduced to something so consistent and reliable.

Their key finding was that the number of votes grows linearly with number of prompts you are trying to chain.

However the issue is that the number of votes you need will grow exponentially with hallucination rate.

▲

patcon 3 hours ago | parent | prev | next [-]

> into the simplest possible steps, recursively call (relatively small) LLMs as agents to execute one step at a time, and using a clever voting scheme to choose how to execute each step.

It's like humans! Everything old is new again :)

▲

adastra22 5 hours ago | parent | prev | next [-]

Why not? That's basically how NASA manages large projects.

▲

Uehreka 4 hours ago | parent | next [-]

One issue I often run into with this stuff is the tightly coupled nature of things in the real world. I’ll fashion an example:

Let’s say you break a job down into 3 tasks: A, B and C. Doing one of those tasks is too much for an LLM to accomplish in one turn (this is something you learn intuitively through experience), but an LLM could break each task into 3 subtasks. So you do that, and start by having the LLM break task A into subtasks A1, A2 and A3. And B into B1, B2 and B3. But when you break down task C, the LLM (which needs to start with a fresh context each time since each “breakdown” uses 60-70% of the context) doesn’t know the details of task A, and thus writes a prompt for C1 that is incompatible with “the world where A1 has been completed”.

This sort of “tunnel vision” is currently an issue with scaling 2025 agents. As useful context lengths get longer it’ll get easier, but figuring out how to pack exactly the right info into a context is tough, especially when the tool you’d reach for to automate it (LLMs) are the same tool that suffers from these context limitations.

None of this means big things aren’t possible, just that the fussyness of these systems increases with the size of the task, and that fussyness leads to more requirements of “human review” in the process.

▲

pinkmuffinere 4 hours ago | parent | prev | next [-]

Reasoning by analogy is great for intuition, but doesn’t guarantee real results hold. Consider “voltage is like water pressure in pipes, so if there’s a cut in my wire’s insulation, the device won’t get enough voltage” — clearly this is not true, even though it relies on an analogy that’s generally useful.

▲

alwa 4 hours ago | parent | next [-]

I really like that analogy, thank you for it. Also applies to “it’s overvoltage, so I just need to poke a little hole in it to let the excess bleed out”…

	▲	wat10000 3 hours ago \| parent [-]
		That one can work, briefly, depending on how conductive your tool is.

▲

CamperBob2 3 hours ago | parent | prev [-]

Well, corona losses are a thing, after all.

▲

etamponi 4 hours ago | parent | prev | next [-]

"basically" is doing a lot of work in this sentence.

▲

Julien_r2 5 hours ago | parent | prev | next [-]

I could imagine that even a small task at NASA might involve more knowledge and logic than the smallest task for a Hanoi's tower problem.

Depends on what is considered as small enough for the LLM to be resolved with a high confidence.

▲

th0ma5 3 hours ago | parent | prev | next [-]

This is a really good analogy because the complex intersections between multiple groups independently working and trying to collaborate together into a collaborative hierarchy towards one large goal was one of the things that hid a lot of the problems that led to the Challenger disaster, according to Feynmen.

▲

mulmen 5 hours ago | parent | prev [-]

NASA has done a lot of amazing things but I wouldn’t bet on them winning a Super Bowl.

▲

HarHarVeryFunny 3 hours ago | parent [-]

They'd have a 50% chance of winning one on Mars, since it would just be NASA vs China

	▲	bangaladore 3 hours ago \| parent [-]
		Every year NASA has a 50% chance of winning the Superbowl- even on Earth! Either they win or don't. /s

▲

esafak 3 hours ago | parent | prev | next [-]

It seems like this could be implemented by any harness.

▲

naasking 4 hours ago | parent | prev [-]

> All of it using natural language.

Combining this with those approaches that recursively reason in latent space would be interesting.