▲ | toxik 3 days ago | |
Predicting next moves of some expert chess policy is just imitation learning, a well-studied proposal. You can add return-to-go to let the network try to learn what kinds of moves are made in good vs bad games, which would be an offline RL regime (eg, Decision Transformers). I suspect chess skill is completely useless for LLMs in general and not an emergent phenomenon, just consuming gradient bandwidth and parameter space to do this neat trick. This is clear to me because the LLMs that aren't trained specifically on chess do not do chess well. | ||
▲ | PaulHoule 3 days ago | parent [-] | |
In either language or chess I'm still a bit baffled how a representation over continuous variables (differentiable no less) works for something that is discrete such as words, letters, chess moves, etc. Add the word "not" a sentence and it is not a perturbation of the meaning but a reversal (or is it?) A difference between communication and chess is that your partner in conversation is your ally in meaning making and will help fix your mistakes which is how they get away with bullshitting. ("Personality" makes a big difference, by the time you are telling your programming assistant "Dude, there's a red squiggle on line 92" you are under its spell) Chess on the other hand is adversarial and your mistakes are just mistakes that your opponent will take advantage of. If you make a move and your hunch that your pieces are not in danger is just slightly wrong (one piece in danger) that's almost as bad as having all your non-King pieces in danger (they can only take one next turn.) |