Remix.run Logo
xg15 5 days ago

The example is a bit of an unfair comparison though. It takes questions already posed in natural language and, as far as I can tell, expects results in natural language, too.

This means that whatever system is evaluated in this challenge necessarily has to deal with natural language. And indeed, a big part of the AlphaProof system was a neural network to convert from natural language to Lean.

None of this has anything to do with reasoning ability.

I think it would be interesting to present an inverse challenge, where the problems are already posed in a formal language. Would a network that first converts them into natural language, then does chain-of-thought on that, then translates the result back into formal language still be better than a simple symbolic reasoner that could operate on the formal language directly?