Remix.run Logo
falcor84 10 hours ago

The thing that TFA doesn't seem to go into is that these mathematical results apply to human agents in exactly the same way as they do to AI agents, and nevertheless we have massive codebases like Linux. If people can figure out how to do it, then there's no math that can help you prove that AIs can't.

pydry 5 hours ago | parent | next [-]

Ive yet to see a human process which used an excessive number of cheap junior developers precisely architected to create high quality software.

If that could have been achieved it would have been very profitable, too. There's no shortage of cheap, motivated interns/3rd world devs and the executive class prefer to rely on disposable resources even when it costs more overall.

The net result was always the opposite though - one or two juniors on a leash could be productive but more than that and it always caused more problems than it solved.

Seeing the same problem with agents. Multi agent orchestration seems like a scam to manufacture demand for tokens.

falcor84 3 hours ago | parent [-]

I'm in absolute agreement that the AI coordination problem exists today when the AI is at junior level. I'm just saying that the mathematical argument is silly to apply to arbitrary future AIs, if and when they reach human capability. Because while coordination problems have not been mathematically solved, the world economy is a case in point that it is possible to coordinate human-level agents to achieve large scale projects at generally sufficient quality levels.

So to be clear, I'm not advising anyone to change their current token consumption habit. I'm just saying that it's silly to apply math to prove the impossibility of something we can literally see around us. It's like a mathematical proof that water isn't really wet.

bloppe 9 hours ago | parent | prev [-]

Humans can also count the number of Rs in strawberry, but good luck proving that mathematically

EagnaIonat 9 hours ago | parent | next [-]

LLMs don't see words. They see tokens, which is why previously they had a hard time counting the r's.

You can certainly prove that mathematically, and giving that proof to an LLM it will give you the correct answer. Which is a prompting technique btw to improve accuracy of an LLMs results.

Although the more recent LLMs have built in approaches to answer this question.

Gemma4 for example:

-----

The user is asking for the number of 'r's in the word "strawberry".

* S

* T

* *R* (1)

* A

* W

* B

* E

* *R* (2)

* *R* (3)

* Y

There are 3 'r's.

-----

GPT-OSS-120b example.

-----

The user asks: "How many r's in strawberry." This is a straightforward counting problem: count the letter 'r' in the word "strawberry". The word "strawberry" contains letters: s t r a w b e r r y. Count r's: there are three 'r's? Let's check: s(1) t(2) r(3) a(4) w(5) b(6) e(7) r(8) r(9) y(10). Actually the word is "strawberry". Let's write out: s, t, r, a, w, b, e, r, r, y. So there are three r's: at positions 3, 8, 9. Yes, three r's. So answer: 3.

But we may also note that "r" appears three times.

Thus answer: 3.

We can provide a short answer.

Thus final: There are three r's in "strawberry".

----

kang 7 hours ago | parent [-]

Doubt if you can make a dumb model smart by feeding it proofs

EagnaIonat 5 hours ago | parent [-]

https://www.promptingguide.ai/techniques/knowledge

Tade0 4 hours ago | parent [-]

Sohnds like a great way to fill up the context before you even start.

falcor84 3 hours ago | parent [-]

Yes, what's your point? That is literally what it does - it adds relevant knowledge to the prompt before generating a response, in order to ground it me effectively.

Tade0 21 minutes ago | parent [-]

My point is that this doesn't scale. You want the LLM to have knowledge embedded in its weights, not prompted in.

tacotime 9 hours ago | parent | prev [-]

I doubt it is possible to mathematically prove much inside of a black box of billions of interconnected weights. But at least in the narrow case of the strawberry problem, it seems likely that LLM inference could reliably recognizing that sort of problem as the type that would benefit from a letter counting tool call as part of the response.