| ▲ | dogscatstrees 2 hours ago | |||||||||||||
> As they did so, they also learned how to improve the prompts they gave AlphaEvolve. One key takeaway: The model seemed to benefit from encouragement. It worked better “when we were prompting with some positive reinforcement to the LLM,” Gómez-Serrano said. “Like saying ‘You can do this’ — this seemed to help. This is interesting. We don’t know why.” Four top logical people in the world are acknowledging this. It is mind-blowing and we don't know why. | ||||||||||||||
| ▲ | dataviz1000 2 hours ago | parent | next [-] | |||||||||||||
I know why. Several people had problems with Sonnet burning through all their credits grinding on a problem it can't solve. Opus fixes this — it has a confidence threshold below which it exits the task instead of grinding. "I spent ~$100 last week testing both against multiplication. Sonnet at 37-digit × 37-digit (~10³⁷) never quits — 15+ minutes, 211KB of output, still actively decomposing numbers when I stopped it. Opus will genuinely attempt up to ~50 digits (112K tokens on a real try), starts doubting around 55 digits, and by 80-digit × 80-digit surrenders in 330 tokens / 9 seconds with an empty answer." -- Opus, helping me with the data The "I don't think this is worth attempting" heuristic is the difference. Sonnet doesn't have it, or has it set much higher. In order to get Opus and some other models to work on harder problems that it assumes it is not worth attempting, it requires an increase of confidence level. I'll finish writing this up this week. I'm making flashy data visual animations to make the point right now. | ||||||||||||||
| ▲ | zarzavat 2 hours ago | parent | prev | next [-] | |||||||||||||
It makes sense to me. Originally LLMs would get stuck in infinite loops generating tokens forever. This is bad, so we trained them to strongly prefer to stop once they reached the end of their answer. However, training models to stop also gave them "laziness", because they might prefer a shorter answer over a meandering answer that actually answered the user's question. Mathematics is unusual because it has an external source of truth (the proof assistant), and also because it requires long meandering thinking that explores many dead ends. This is in tension with what models have been trained to do. So giving them some encouragement keeps them in the right state to actually attempt to solve the problem. | ||||||||||||||
| ▲ | brookst 2 hours ago | parent | prev | next [-] | |||||||||||||
Do we know why it works for humans? Models are trained on human outputs. It’s not super surprising to me that inputs following encouraging patterns product better results outputs; much of the training material reflects that. | ||||||||||||||
| ||||||||||||||
| ▲ | CivBase 2 hours ago | parent | prev [-] | |||||||||||||
This seems pretty obvious, no? It's pattern matching on training material. There is almost certainly an overlap between positivity and success in the training material. Positive prompts cause the pattern matching to weight towards positivity and therefor more successful material. | ||||||||||||||
| ||||||||||||||