Checking the arithmetic in every paper published seems like an good use case for LLMs. Has someone built a better version than uploading a PDF to ChatGPT and asking it to check the arithmetic?

▲

ironbound 6 hours ago | parent [-]

LLM's are why we're in this mess, they can't do math or count r's

▲

gordonhart 5 hours ago | parent | next [-]

Modern reasoning models are actually pretty good at arithmetic and almost certainly would have caught this error if asked.

Source: we benchmark this sort of stuff at my company and for the past year or so frontier models with a modest reasoning budget typically succeed at arithmetic problems (except for multiplication/division problems with many decimal places, which this isn't).

▲

RobotToaster 5 hours ago | parent [-]

Interesting, how have you found they have been performing at more complex things like calculus and analysis?

	▲	speedgoose 5 hours ago \| parent [-]
		It’s on the front page of HN once in a while.

▲

literalAardvark 5 hours ago | parent | prev | next [-]

They can't do math?

ChatGPT 5.2 has recently been churning through unsolved Erdös problems.

I think right now one is partially validated by a pro and the other one I know of is "ai-solved" but not verified. As in: we're the ones who can't quite keep up.

https://arxiv.org/abs/2601.07421

And the only reason they can't count Rs is that we don't show them Rs due to a performance optimization.

▲

ironbound 3 hours ago | parent [-]

You can feed it the Hodge Conjecture for all I care, the current algorithms are a joke and without real breakthroughs your just generating left to right text with billions in hardware.

	▲	literalAardvark 39 minutes ago \| parent [-]
		Guess frontier math and programming are just left to right text then.

▲

nine_k 5 hours ago | parent | prev | next [-]

An LLM usually has a powerful digital computer right in its disposal, and could use it as a tool to do precise calculations.

▲

brookst 5 hours ago | parent | prev | next [-]

More accurate to say they can’t see r’s. They process language but not letters.

▲

UqWBcuFx6NV4r 5 hours ago | parent | prev [-]

Yes, yes. We’ve all seen the same screenshots. Very funny.

Those of us who don’t base our technical understandings on memes are well aware of the tooling at the disposal of all modern reasoning models gives them the capability to do such things.

Please don’t bring the culture war here.