Remix.run Logo
dawnofdusk 6 days ago

>but we know that reasoning is an emergent capability!

This is like saying in the 70s that we know only the US is capable of sending a man to the moon. Just because the reasoning developed in a particular context means very little about what the bare minimum requirements for that reasoning are.

Overall I am not a fan of this blogpost. It's telling how long the author gets hung up on a paper making "broad philosophical claims about reasoning", based on what reads to me as fairly typical scientific writing style. It's also telling how highly cherry-picked the quotes they criticize from the paper are. Here is some fuller context:

>An expanding body of analyses reveals that LLMs tend to rely on surface-level semantics and cluesrather than logical procedures (Chen et al., 2025b; Kambhampati, 2024; Lanham et al., 2023; Stechly et al., 2024). LLMs construct superficial chains of logic based on learned token associations, often failing on tasks that deviate from commonsense heuristics or familiar templates (Tang et al., 2023). In the reasoning process, performance degrades sharply when irrelevant clauses are introduced, which indicates that models cannot grasp the underlying logic (Mirzadeh et al., 2024)

>Minor and semantically irrelevant perturbations such as distractor phrases or altered symbolic forms can cause significant performance drops in state-of-the-art models (Mirzadeh et al., 2024; Tang et al., 2023). Models often incorporate such irrelevant details into their reasoning, revealing a lack of sensitivity to salient information. Other studies show that models prioritize the surface form of reasoning over logical soundness; in some cases, longer but flawed reasoning paths yield better final answers than shorter, correct ones (Bentham et al., 2024). Similarly, performance does not scale with problem complexity as expected—models may overthink easy problems and give up on harder ones (Shojaee et al., 2025). Another critical concern is the faithfulness of the reasoning process. Intervention-based studies reveal that final answers often remain unchanged even when intermediate steps are falsified or omitted (Lanham et al., 2023), a phenomenon dubbed the illusion of transparency (Bentham et al., 2024; Chen et al., 2025b).

You don't need to be a philosopher to realize that these problems seem quite distinct from the problems with human reasoning. For example, "final answers remain unchanged even when intermediate steps are falsified or omitted"... can humans do this?