▲ | ants_everywhere 3 days ago | ||||||||||||||||||||||
I'm not at all surprised that LLMs are good at interactive theorem proving. But I am surprised that Claude Code is good at it. Maybe I'm just not using it right but when I have a task in a less popular programming language, Gemini 2.5 pro tends to be much better than Claude Opus despite Opus doing better on benchmarks. Better in the sense that Gemini will typically bang out a correct solution with best practices immediately and Opus might take half a day of correcting basic errors and misunderstandings. If anyone knows what I'm talking about and knows a way to improve this please let me know. Claude still feels to me like an over-eager engineer that is more eager to get thing done fast than to do them correctly. It may be that with a task like theorem proving you can just burn credits and time because there's a clear answer. | |||||||||||||||||||||||
▲ | jsw97 3 days ago | parent [-] | ||||||||||||||||||||||
I use the Gemini mcp and my CLAUDE.md has instructions to consult Gemini for strategy. Seems to work well. I have the lowest Claude plan so I don’t know how this would work vs Opus, for example. Separately, I have been meaning to implement a cheating detector — have run into Claude modifying problem statements, adding axioms, etc. | |||||||||||||||||||||||
|