|
| ▲ | gordonhart 5 hours ago | parent | next [-] |
| Modern reasoning models are actually pretty good at arithmetic and almost certainly would have caught this error if asked. Source: we benchmark this sort of stuff at my company and for the past year or so frontier models with a modest reasoning budget typically succeed at arithmetic problems (except for multiplication/division problems with many decimal places, which this isn't). |
| |
| ▲ | RobotToaster 5 hours ago | parent [-] | | Interesting, how have you found they have been performing at more complex things like calculus and analysis? | | |
|
|
| ▲ | literalAardvark 5 hours ago | parent | prev | next [-] |
| They can't do math? ChatGPT 5.2 has recently been churning through unsolved Erdös problems. I think right now one is partially validated by a pro and the other one I know of is "ai-solved" but not verified. As in: we're the ones who can't quite keep up. https://arxiv.org/abs/2601.07421 And the only reason they can't count Rs is that we don't show them Rs due to a performance optimization. |
| |
| ▲ | ironbound 3 hours ago | parent [-] | | You can feed it the Hodge Conjecture for all I care,
the current algorithms are a joke and without real breakthroughs your just generating left to right text with billions in hardware. | | |
|
|
| ▲ | nine_k 5 hours ago | parent | prev | next [-] |
| An LLM usually has a powerful digital computer right in its disposal, and could use it as a tool to do precise calculations. |
|
| ▲ | brookst 5 hours ago | parent | prev | next [-] |
| More accurate to say they can’t see r’s. They process language but not letters. |
|
| ▲ | UqWBcuFx6NV4r 5 hours ago | parent | prev [-] |
| Yes, yes. We’ve all seen the same screenshots. Very funny. Those of us who don’t base our technical understandings on memes are well aware of the tooling at the disposal of all modern reasoning models gives them the capability to do such things. Please don’t bring the culture war here. |