Ironically, that lesswrong article is more wrong than right.

First, chess is perfect for such modeling. The game is basically a tree of legal moves. The "world model" representation is already encoded in the dataset itself and at a certain scale the chance of making an illegal move is minimal, as the dataset itself includes an insane amount of legal moves compared to illegal moves, let alone when you are training it on a chess dataset like PGN one

Second, the probing is quite... a subjective thing.

We are cherry-picking activations across an arbitrary amount of dimensions, on a model specifically trained for chess, taking these arbitrary representations and displaying it on 2D graph.

Well yeah, with enough dimensions and cherry-picking, we can also show how "all zebras are elephants, because all elephants are horses and look their weights overlap in so many dimensions - large four-legged animals you see on safari!" - especially if we cherry-pick it. Especially if we tune a dataset on it.

This shows nothing other than "training LLMs on a constrained move dataset makes LLM great at predicting next move in that dataset".

▲

flender 7 days ago | parent [-]

And if it knew every possible board configuration and optimal move, it could potentially do as well as it could, but instead if it were to just recognize “this looks like a chess game” and use an optimized tool to determine the next move, that would be a better use of training, it would seem.

	▲	thecupisblue 7 days ago \| parent [-]
		Way better use, at this point that engine is more like a world's most expensive monte carlo search.