The AI revolution in math has arrived

> As they did so, they also learned how to improve the prompts they gave AlphaEvolve. One key takeaway: The model seemed to benefit from encouragement. It worked better “when we were prompting with some positive reinforcement to the LLM,” Gómez-Serrano said. “Like saying ‘You can do this’ — this seemed to help. This is interesting. We don’t know why.”

Four top logical people in the world are acknowledging this. It is mind-blowing and we don't know why.

▲

dataviz1000 17 minutes ago | parent | next [-]

I know why.

Several people had problems with Sonnet burning through all their credits grinding on a problem it can't solve. Opus fixes this — it has a confidence threshold below which it exits the task instead of grinding.

"I spent ~$100 last week testing both against multiplication. Sonnet at 37-digit × 37-digit (~10³⁷) never quits — 15+ minutes, 211KB of output, still actively decomposing numbers when I stopped it. Opus will genuinely attempt up to ~50 digits (112K tokens on a real try), starts doubting around 55 digits, and by 80-digit × 80-digit surrenders in 330 tokens / 9 seconds with an empty answer." -- Opus, helping me with the data

The "I don't think this is worth attempting" heuristic is the difference. Sonnet doesn't have it, or has it set much higher. In order to get Opus and some other models to work on harder problems that it assumes it is not worth attempting, it requires to increase the confidence level.

I'll finish writing this up this week. I'm making flashy data visual animations to make the point right now.

▲

zarzavat 14 minutes ago | parent | prev | next [-]

It makes sense to me.

Originally LLMs would get stuck in infinite loops generating tokens forever. This is bad, so we trained them to strongly prefer to stop once they reached the end of their answer.

However, training models to stop also gave them "laziness", because they might prefer a shorter answer over a meandering answer that actually answered the user's question.

Mathematics is unusual because it has an external source of truth (the proof assistant), and also because it requires long meandering thinking that explores many dead ends. This is in tension with what models have been trained to do. So giving them some encouragement keeps them in the right state to actually attempt to solve the problem.

▲

brookst 31 minutes ago | parent | prev | next [-]

Do we know why it works for humans?

Models are trained on human outputs. It’s not super surprising to me that inputs following encouraging patterns product better results outputs; much of the training material reflects that.

	▲	latentsea 22 minutes ago \| parent \| next [-]
		> Do we know why it works for humans? Try to figure it out. You can do it.
	▲	gxs 26 minutes ago \| parent \| prev [-]
		If I had to wager a lazy, armchair guess, I think it forces it to think harder/longer The answer is probably more straightforward than we think, e.g. “the user thinks I can do this so I better make sure I didn’t miss anything”

▲

CivBase 20 minutes ago | parent | prev [-]

This seems pretty obvious, no?

It's pattern matching on training material. There is almost certainly an overlap between positivity and success in the training material. Positive prompts cause the pattern matching to weight towards positivity and therefor more successful material.

▲

norejisace 29 minutes ago | parent | prev | next [-]

Interesting development. It feels like AI is getting much better at symbolic reasoning, not just pattern recognition.

▲

claysmithr 2 hours ago | parent | prev | next [-]

I wonder when AI will be able to discern the passage of time

▲

Buttons840 26 minutes ago | parent | next [-]

Can't you just give it the time in each prompt? Would that work?

I've seen this mentioned a few times though, so I think maybe it's more complicated than this?

▲

1970-01-01 an hour ago | parent | prev | next [-]

It already does time in prompt-blocks. It knows time is linear and what just happened, what happened before that, and what happened before that.

	▲	claysmithr an hour ago \| parent [-]
		When I tried to use it as an AI CEO and Life Coach, it never was able to discern time passing, what I've already done, what needed to be done. It just said the same stuff over and over, stuff I've already done. That and it's kind of stuck in the era it was trained in. If it felt time passing like a human maybe it would be conscious? Nevertheless not having a sense of time makes it really bad at planning anything. I used Gemini Pro.

▲

maplethorpe an hour ago | parent | prev [-]

Altman has estimated one year until ChatGPT is capable of measuring time passed.

https://tech.yahoo.com/ai/chatgpt/articles/chatgpt-fails-mis...

▲

ambicapter an hour ago | parent | next [-]

Sounds like Musk setting deadlines for Mars landings.

▲

VladVladikoff an hour ago | parent | prev [-]

Can’t tell if you are being sarcastic but Altman’s whole job is to make bullshit near future predictions about rapid development of AI in the public.

	▲	random__duck an hour ago \| parent [-]
		Thankyou for stating the obvious, for some reason we need to repeat this. ^^;

▲

themafia an hour ago | parent | prev [-]

There are several high value prizes for mathematical research. Let me know when an "AI" has earned one of them. Otherwise:

> When Ryu asked ChatGPT, “it kept giving me incorrect proofs,” [...] he would check its answers, keep the correct parts, and feed them back into the model

So you had a conversational calculator being operated by an actual domain expert.

> With ChatGPT, I felt like I was covering a lot of ground very rapidly

There's no way to convert that feeling into a measurement of any actual value and we happen to know that domain experts are surprisingly easy to fool when outside of their own domains.

▲

gxs 24 minutes ago | parent [-]

Wow that was your takeaway?

> “2025 was the year when AI really started being useful for many different tasks,” said Terence Tao

I think I’ll go out on a limb and agree with Terrence Tao, I think the dude is well known in the math community, or something

	▲	p1dda 7 minutes ago \| parent \| next [-]
		I think he means useful for mathematicians getting paid shilling for AI models
	▲	noobermin 21 minutes ago \| parent \| prev [-]
		If anything his simping for AI models makes me more suspect of him than I ever was because my own eyes show me their limits.