> the spicy autocomplete can solve difficult open math problems

No it can't. It can't even solve my son's 4th grade math homework. (This is a real use case for me, not a dumb benchmark.)

You just know nothing about math and are happy to parrot bullshit AI salesmen are selling you.

▲

sanderjd an hour ago | parent | next [-]

I would genuinely be interested in knowing what you're doing that led you to this conclusion.

I would be shocked if I was unable to solve 4th grade math homework with any of the contemporary frontier models. I spend most days using them to do significantly more complex things than that.

▲

margalabargala an hour ago | parent [-]

If they took a blurry photo of the piece of paper and uploaded to chatGPT saying "solve this" then I would totally believe it. The frontier models are mostly obnoxiously bad at OCR and properly ingesting what's on an image of a page.

If you write out the 4th grade math problem, they would have no trouble.

▲

otabdeveloper4 41 minutes ago | parent [-]

No, LLMs just can't do math.

	▲	bdamm 28 minutes ago \| parent \| next [-]
		They can definitely recognize the problem class and build programs to do math. So what's the difference? It's like saying that people can't turn high torque nuts on machine bolts, because you can't use your fingers to do it. But you can use a wrench, so effectively, we can turn high torque nuts on machine bolts even though it isn't something we can natively do unaided.
	▲	minimaxir 36 minutes ago \| parent \| prev [-]
		If your math does not involve multiplying 20 digit numbers, modern LLMs can "do" math even without a Python tool despite the counterintuition of next token prediction.

▲

skinner_ an hour ago | parent | prev | next [-]

> You just know nothing about math and are happy to parrot bullshit AI salesmen are selling you.

Not the parent poster here. I do know things about math. I wrote a few papers related to the unit distance problem (https://arxiv.org/abs/2311.10069, https://arxiv.org/abs/2406.15317) and spent quite some time trying to solve it. I had no chance of coming up with the proof that the spicy autocomplete came up with. Dumb benchmark, sure.

	▲	otabdeveloper4 38 minutes ago \| parent [-]
		LLMs are good with symbolic manipulation but can't reason. You can skirt around not reasoning in research math because so much of it is just extremely tedious symbolic manipulation. You can't cheat with advanced fourth grade math, though. They don't know algebra yet and can't substitute verbosity for reasoning.

▲

threatofrain an hour ago | parent | prev | next [-]

We've already long past that threshold.

▲

simonw an hour ago | parent | prev [-]

Reasoning models with access to Python have been able to solve 4th grade math homework for over a year now. Prove me wrong: show me a 4th grade math problem they can't handle.

▲

otabdeveloper4 an hour ago | parent [-]

> show me a 4th grade math problem they can't handle

Sure.

"8 7 6 5 4 3 2 1 - add minus signs and parenthesis to get 31."

P.S. There is an answer online and some LLMs will just copy it verbatim. This doesn't count.

	▲	simonw 35 minutes ago \| parent [-]
		Whoa, 4th grade math problems got hard! I'm not sure how I'd tackle that one myself.