▲ | og_kalu 4 days ago | |
He is testing several models, some of which cannot reliably output legal moves. That's different from saying all models including the one he thinks understands can't generate a legal move in 10 tries. 3.5-turbo-instruct's illegal move rate is about 5 or less in 8205 | ||
▲ | IanCal 4 days ago | parent [-] | |
I also wonder what kind of invalid moves they are. There's "you can't move your knight to j9 that's off the board", "there's already a piece there" and "actually that would leave you in check". I think it's also significantly harder to play chess if you were to hear a sequence of moves over the phone and had to reply with a followup move, with no space or time to think or talk through moves. |