Remix.run Logo
gshulegaard 6 days ago

> but we know that reasoning is an emergent capability!

Do we though? There is widespread discussion and growing momentum of belief in this, but I have yet to see conclusive evidence of this. That is, in part, why the subject paper exists...it seeks to explore this question.

I think the author's bias is bleeding fairly heavily into his analysis and conclusions:

> Whether AI reasoning is “real” reasoning or just a mirage can be an interesting question, but it is primarily a philosophical question. It depends on having a clear definition of what “real” reasoning is, exactly.

I think it's pretty obvious that the researchers are exploring whether or not LLMs exhibit evidence of _Deductive_ Reasoning [1]. The entire experiment design reflects this. Claiming that they haven't defined reasoning and therefore cannot conclude or hope to construct a viable experiment is...confusing.

The question of whether or not an LLM can take a set of base facts and compose them to solve a novel/previously unseen problem is interesting and what most people discussing emergent reasoning capabilities of "AI" are tacitly referring to (IMO). Much like you can be taught algebraic principles and use them to solve for "x" in equations you have never seen before, can an LLM do the same?

To which I find this experiment interesting enough. It presents a series of facts and then presents the LLM with tasks to see if it can use those facts in novel ways not included in the training data (something a human might reasonably deduce). To which their results and summary conclusions are relevant, interesting, and logically sound:

> CoT is not a mechanism for genuine logical inference but rather a sophisticated form of structured pattern matching, fundamentally bounded by the data distribution seen during training. When pushed even slightly beyond this distribution, its performance degrades significantly, exposing the superficial nature of the “reasoning” it produces.

> The ability of LLMs to produce “fluent nonsense”—plausible but logically flawed reasoning chains—can be more deceptive and damaging than an outright incorrect answer, as it projects a false aura of dependability.

That isn't to say LLMs aren't useful, just exploring it's boundaries. To use legal services as an example, using an LLM to summarize or search for relevant laws, cases, or legal precedent is something it would excel at. But don't ask an LLM to formulate a logical rebuttal to an opposing council's argument using those references.

Larger models and larger training corpuses will expand that domain and make it more difficult for individuals to discern this limit; but just because you can no longer see a limit doesn't mean there is none.

And to be clear, this doesn't diminish the value of LLMs. Even without true logical reasoning LLMs are quite powerful and useful tools.

[1] https://en.wikipedia.org/wiki/Logical_reasoning

hakfoo 3 days ago | parent [-]

Discerning the limits is the most important thing of all, and we seem very eager to obfuscate it for LLMs.

We so desperately want something we can sell as AGI or at least magic that the boundaries on the tools are few, far-between, and mostly based on legal needs "don't generate nudes of celebrities who can sue us" rather than grasped technical limits.

The more complex and sophisticated the query, the harder it will be to double-check and make sure you're still on the rails. So it's the responsibility of the people offering the tools to understand and define their limits before customers unknowningly push their legal-assistant LLMs into full Sovereign Citizen mode.