Remix.run Logo
stratos123 2 days ago

That wouldn't even necessarily be true if models really "couldn't count", since software exists - if an LLM is making an Excel spreadsheet rather than doing everything manually, it's both much harder for it to mess up and easier to notice and recover. It's even less true given that what this paper actually tests is "LLMs don't have a literally perfect accuracy when you make them do increasingly big problems with zero thinking".

(Confabulation is IMO a much bigger problem, but it's unrelated to architecture - it's an artifact of how models are currently trained.)