It would be more interesting to make it build a chess engine and compare it against Stockfish. The chess engine should be a standalone no-dependencies C/C++ program that fits in NNN lines of code.

▲

vunderba 11 hours ago | parent | next [-]

My back-of-the-envelope guess would be that 99% of LLMs given the task to build a chess engine would probably just end up implementing a flavor of negamax and calling it a day.

https://en.wikipedia.org/wiki/Negamax

▲

gpm 11 hours ago | parent | prev | next [-]

Comparing against stockfish isn't fair. That's comparing against enormous amounts of compute spent experimenting with strategies, training neutral nets, etc.

It will lose so badly there will be no point in the comparison.

Besides you could compare models (and harnesses) directly against eachother.

▲

akomtu 6 hours ago | parent [-]

Stockfish is a good reference point, an objective measure of how far the LLM's advanced.

	▲	mikkupikku an hour ago \| parent [-]
		It's not. Maybe if you used old versions of stockfish that predate the neural net methods used by current versions, because otherwise you'd be comparing the hand-rolled (by an LLM) position evaluation functions against an NNUE and the results of that are a forgone conclusion; stockfish will stomp it every time. Maybe that's the result you want for some sort of rhetorical reason, but it would nonetheless not be an informative test.

▲

ykhli 9 hours ago | parent | prev [-]

oh that is super interesting. ty for the idea!