Remix.run Logo
simianwords 2 hours ago

false, chess ELO is pretty good

https://maxim-saplin.github.io/llm_chess/

ets not cherry pick and actually see benchmarks please. i would say even ~1000 elo means that it can reason better than the average human.

bigfishrunning 2 hours ago | parent [-]

If you look at the "workflow" section of that page, they had to add a bunch of scaffolding around telling the model what moves are legal -- an llm can't keep enough context to know how to play chess; only to choose an advantageous move from a given list. But feel free to "cherry pick".

simianwords 2 hours ago | parent [-]

why do you think this falsifies that it can't reason?