This is exactly backwards. The brittleness is because they emulate reasoning without actually algorithmically performing it.

Add.: I pointed to this class of problems specifically because they require the ability to abstract in a way that the question itself does not immediately suggest. Math problems are different in that they are described in terms of art that are closely related to certain patterns of manipulation (that is, the paper texts tend to contain both in close proximity to one another).

▲

aspenmartin 6 hours ago | parent [-]

For you, a system needs to reason perfectly and flawlessly, all the time? So humans do not reason? Humans don't have brittle failure modes?

> they require the ability to abstract in a way that the question itself does not immediately suggest

yes, yet there are multitudes of other measurements of the same kind where LLMs reason perfectly well and better in many cases than a human could.

> Math problems are different in that they are described in terms of art that are closely related to certain patterns of manipulation (that is, the paper texts tend to contain both in close proximity to one another).

Is your logic really that math problems are actually easier to answer without reasoning and just by blending together closely related papers? I would definitely suggest reading the literature a bit more on this topic.

	▲	gmueckl 2 hours ago \| parent [-]
		Humans are not flawless, but they are much, much better at reasoning than LLMs are. LLMs can be made to fail quite reliably and easily because they cannot build proper manipulatable/predictive models. This is related to the point that Yann LeCunn makes when advocating for world models (for the physical world) with predictive power.