Remix.run Logo
og_kalu 10 months ago

He is testing several models, some of which cannot reliably output legal moves. That's different from saying all models including the one he thinks understands can't generate a legal move in 10 tries.

3.5-turbo-instruct's illegal move rate is about 5 or less in 8205

IanCal 10 months ago | parent [-]

I also wonder what kind of invalid moves they are. There's "you can't move your knight to j9 that's off the board", "there's already a piece there" and "actually that would leave you in check".

I think it's also significantly harder to play chess if you were to hear a sequence of moves over the phone and had to reply with a followup move, with no space or time to think or talk through moves.