Remix.run Logo
helloplanets 11 hours ago

I wonder how much fine tuning against something like Stockfish top moves would help a model in solving novel middle game positions. Something like this format: https://database.lichess.org/#evals

I'd be pretty surprised if it did help in novel positions. Which would make this an interesting LLM benchmark honestly: Beating Stockfish from random (but equal) middle game positions. Or to mix it up, from random Chess960 positions.

Of course, the basis of the logic the LLM would play with would come from the engine used for the original evals. So beating Stockfish from a dataset based on Stockfish evals would seem completely insufficient.

ActivePattern 2 hours ago | parent [-]

I am quite confident that an LLM will never beat a top chess engine like Stockfish. An LLM is a generalist -- it contains a lot of world knowledge, and nearly all of it is completely irrelevant to chess. Stockfish is a specialist tuned specifically to chess, and hence able to spend its FLOPs much more efficiently towards finding the best move.

The most promising approach would be tune a reasoning LLM on chess via reinforcement learning, but fundamentally, the way an LLM reasons (i.e. outputting a stream of language tokens) is so much more inefficient than the way a chess engine reasons (direct search of the game tree).