Remix.run Logo
_diyar 6 days ago

We expect this approach to work because it's currently the best working approach. Nothing else comes close.

Using symbolic language is a good idea in theory, but in practice it doesn't scale as well as auto-regression + RL.

The IMO results of DeepMind illustrate this well: In 2024, they solved it using AlphaProof and AlphaGeometry, using the Lean language as a formal symbolic logic[1]. In 2025 they performed better and faster by just using a fancy version of Gemini, only using natural language[2].

[1] https://deepmind.google/discover/blog/ai-solves-imo-problems...

[2] https://deepmind.google/discover/blog/advanced-version-of-ge...

Note: I agree with the notion of the parent comment that letting the models reason in latent space might make sense, but that's where I'm out of my depth.

xg15 5 days ago | parent | next [-]

The example is a bit of an unfair comparison though. It takes questions already posed in natural language and, as far as I can tell, expects results in natural language, too.

This means that whatever system is evaluated in this challenge necessarily has to deal with natural language. And indeed, a big part of the AlphaProof system was a neural network to convert from natural language to Lean.

None of this has anything to do with reasoning ability.

I think it would be interesting to present an inverse challenge, where the problems are already posed in a formal language. Would a network that first converts them into natural language, then does chain-of-thought on that, then translates the result back into formal language still be better than a simple symbolic reasoner that could operate on the formal language directly?

safety1st 6 days ago | parent | prev [-]

Very interesting stuff, thanks!