Remix.run Logo
GistNoesis 7 days ago

We should have solved chess already.

We should be aiming to solve chess, but we are not even trying.

If the complexity of chess is finite this is possible.

Chess AI have become so good that maybe there is no more progress to be made.

We must abandon the concept of "valuation".

Here is the criterium for solving chess.

A chess engine is a constant (or bounded) time, (and memory bounded) function f which takes any position and return either 1 "white win", 0 "draw", -1 "white lose".

A chess engine solve chess if we can't find any violation in the Bellman equation :

f(gamestate) = max over legal moves of ( -f(gamestate+move) ) if there are legal moves or f(gamestate) = result if there are no legal moves

This function f can be encoded either by a neural network or some code, but it must be computable in constant (or bounded) time.

The whole problem of solving chess is now simplified to mining the gamestate space for counter examples.

Like in math you can conjecture that your chess engine has solved the game and it stands until someone else find a violation of your game engine (either it doesn't compute in bounded time, or the bellman equation is invalid).

To verify a chess engine you can just sample a random gamestate, or a known to be difficult gamestate and check if the equation holds, that's at most max number of legal moves + 1, function evaluation.

You can try applying this concept to simple games like tic-tac-toe, or trying to compress endgame table with a neural network.

We can find such a candidate function by training a neural network by minimizing the current expected number of violation in the bellman equation over various dataset of gamestate. Once we "grok" it to zero we are done. To soften the problem you can train an auxilliary continuous function g which output the probability of 1, 0, or -1 (with a softmax) and add a discount factor like in Reinforcement Learning, but the final result is the discrete argmax of it (like in "deterministic policy gradient") with no discount factor.

Once you have this finite time oracle, playing chess is just wandering along the graph of drawable position to push your adversary into a position where his engine will have a violation of the bellman equation aka a mis-evaluation of the position where if he goes into it you will get into the winnable positions where you stay until the end. (Not all violations can be exploited as the adversary can sometime avoid going into "uncertain" positions).

A simpler (less strong) chess strategy may avoid going into "uncertain" positions as long as the position is unreachable because previous legal positions have a preferred choice. (4 state logic with win, draw, lose, uncertain). Or ordered list of moves. They make the problem easier but complexify the corresponding bellman equation.

So the game becomes once again exploring the space of "drawable" game positions in order to find positions which the adversary can't escape going into a uncertain (and losing) position. Aka playing the adversary and not the board which is an harder problem except if the adversary can play perfectly. Aka playing for the win.

jibal 7 days ago | parent | next [-]

This comment and your other comments are simply wrong and full of nonsense. Endgame table generators are pure solvers ... given enough time, they can solve chess from the initial position. But the amount of time is longer than the time until the heat death of the universe, and to record the best-play game tree--without which a solution isn't useful--would take more material than there is in the universe.

GistNoesis 7 days ago | parent [-]

>This comment and your other comments are simply wrong and full of nonsense

That's because you are not understanding me.

My conviction is the complexity of chess is not as high as we think, and that there exist a neural network of less than 10B float32 weights which can encode perfect play.

Neural network evaluations are now used as heuristic to evaluate the position and even without tree searching they play very well, but they are usually very small and complemented by a few high level features as input. A bigger network, well fine-tuned can probably reach perfect play without the need of tree searching.

A thought exercise is trying to compress endgame table with a neural network and see how big we need it to be in order to reach perfect play. The thing is : you don't need to train it on all games from the endgame table before it converges to perfect play.

You can know how close you are to the optimal, by counting the number of Bellman equation violation (or not observing them)

You can even train it by having it referencing the previously trained oracle of endgame table chess. You solve chess with a neural network when there are only 2 pieces. Then you solve chess for 3 pieces eventually using the chess for 2 pieces oracle. Then you solve chess for 4 pieces using the chess for 3 pieces oracle, and so on ... until you reach chess for 32 pieces.

Adding pieces up only increase complexity up to a point.

jll29 7 days ago | parent | next [-]

>the complexity of chess is not as high as we think, and that there exist a neural network of less than 10B float32 weights which can encode perfect play.

That is certainly not the mainstream view, so you will have to support your conjecture by some evidence if you would like to convince people that you are right (or demonstrate it empirically end-to-end).

GistNoesis 6 days ago | parent | next [-]

My conviction is based on the interesting phenomenon of game tree collapsing.

When you get better at the game, the number of "candidate" moves you need to explore goes down. For example when you are naive at chess you need to explore all legal moves, and then all legal moves response to these moves. The number of nodes in this tree grows exponentially, and the tree depth is rather limited to a small depth.

But when your list of "candidate" move is reduced to size 1. The number of nodes in the tree is equal to its depth. You can "unroll the line very cheaply". (That's the concept of lines in chess. In go that's the concept of "ladders").

When your candidate list is of size 2, the game tree is of size 2^(depth+1).

You can have fractional candidate list size, by only needing to explore down the tree some times, or having the need to explore only 2 candidate some times.

The complexity grows as (Fractional Candidate List Size) ^ (depth + 1 ). With FCLS being between 1 and 2. There is a transition phase which occurs, and allows to go much deeper in the game tree which simplify things and make things tractable (because you then reach endgame tables or an equivalence class number (aka you don't necessarily know the result yet you just know it will be the same result as all these other types of games) ).

So your function is allowed to have a smaller inner function called recursively to explore the game tree of candidate moves within a computational budget, to help make a decision. The upper function resolve the equivalence number to their "win" "draw" "lose" value always in the same way and takes the appropriate maximum (remember your network only goal is to be always consistent).

The better your network get at managing his computational budget the deeper it can go. You can think of it as a version of alpha-beta pruning with improving heuristics.

jibal 7 days ago | parent | prev [-]

But ... but ... it's his conviction.

jibal 7 days ago | parent | prev | next [-]

I do understand you, but you and your "conviction" are wrong. Apparently you aren't even familiar with AlphaZero.

> Adding pieces up only increase complexity up to a point.

As you said, until you reach 32 pieces. What you vastly underestimate is how much complexity is added at each level. You're like the king in the fable who agreed to give the vizier a small amount of grain: 1 grain for the first square, 2 grains for the second square, 4 grains for the third square, etc. The king thought he was getting a bargain.

> The thing is : you don't need to train it on all games from the endgame table before it converges to perfect play.

But you do, because there is no algorithmic simplification, at all. Strong chess players understand that while there are common patterns throughout chess, their application is highly specific to the position. That's why we have endgame tables, which are used to solve positions that pattern matching doesn't solve. You can get excellent play out of an NN, but that's not the same as solving it. And the absence of Bellman violations is necessary, but not sufficient ... you can't use it to prove that you've solved chess. The fact is that it is impossible within pragmatic limits to prove that chess has been solved. But so what? Programs like AlphaZero and Stockfish already play well enough for any purpose.

Anyway, you're free to go implement this ... good luck. I won't respond further.

GistNoesis 7 days ago | parent [-]

>the absence of Bellman violations is necessary, but not sufficient

It is sufficient though. All chess game ends in a finite number of moves. If you are consistent aka you have zero violation. Thinking backward (dynamic programming), you can "color" correctly final positions. And you can color correctly all position at 1 turn before end because you are consistent. Then 2 turn before ends,... Then recursively you have correctly colored all chess positions.

You are missing the complexity collapse which can occur in games, like for example the game of Nim, where a simple function can predict the outcome. When you have 15 sticks and you can remove any 3, naively one would think that there are 2 ^ 15 game states and "15 choose 3" legal game moves by turn, but in fact there are equivalence classes, which mean the game state is reduced to 15 different states only.

Modulo the prism of a trained neural network, game states got grouped into equivalence classes and the same phenomenon occur in chess, which allow simplification by high level rules like white wins because white wins the pawn race.

I am more familiar with Stockfish than AlphaZero or LeelaChess0, but the Stockfish demonstrates that well engineered features can bring down the neural network size a lot. In particular counting usually poses problem to neural networks, and counting like how many moves before the 50 moves rule or number of moves before a pawn race are edge cases that can be simplified (DTZ, and DTM).

Also these engines are trying to compress the evaluation function which is a lot more information than just whether the position is win, draw or loss, aka just the frontier.

jibal 7 days ago | parent [-]

> It is sufficient though. All chess game ends in a finite number of moves.

Again, the issue is the size of that number, not that it is finite.

> You are missing the complexity collapse which can occur in games, like for example the game of Nim

All you have is ad hominems ... "you don't understand", "you are missing" ...

I'm not missing the intellectual dishonesty of obviously irrelevant examples like Nim, Rubik's cube, or cryptanalysis that have clear mathematical regularities completely lacking from chess.

Now stop insulting me and my intelligence. Again, if you want to go implement this, or write a journal paper, then have at it. But this "we" should have solved chess already, and "we" should be aiming to solve chess, but "we" are not even trying is arrogant trolling.

Over and out.

throw-qqqqq 7 days ago | parent | prev [-]

> My conviction is the complexity of chess is not as high as we think, and that there exist a neural network of less than 10B float32 weights which can encode perfect play

Given the number of possible games (numbers above 10^80) you would need EXTREME sparsity to encode it in less than 10B / 10^10 params. Sounds information theoretically impossible to me ¯\_(ツ)_/¯

Leela Chess Zero has hundreds of millions to few billion parameters AFAIK.

The argument about the game being finite and thus solvable is misguided IMO.

AES encryption is also finite, and you can enumerate all possible combinations, but not before the end of time..

tsimionescu 7 days ago | parent | prev | next [-]

> We should be aiming to solve chess, but we are not even trying.

We know exactly how to solve chess, we have known for decades. The function f is called min-max, and can be optimized with things like alpha pruning. Given that chess is a bounded game, this is a bounded time and bounded space algorithm. The definition of `f` that you gave can actually be quite directly encoded in Haskell and executed (though it will miss some obvious optimizations).

The problem is that this algorithm seems to be close to optimal and it would still take some few thousand years of computation time to actually run it to solve chess (or was it decades, or millions of years? not really that relevant).

Now, of course, no one has actually proved that this is the optimal algorithm, so for all we know, there exists a much simpler `f` that could take milliseconds on a pocket calculator. But this seems unlikely given the nature of the problem, and either way, it's just not that interesting for most people to put the kind of deep mathematical research into it that it would take.

Solving chess is not really a very interesting problem as pure mathematics. The whole interest was in beating human players with human-like strategies, which has thoroughly been achieved. The most interesting thing that remains is challenging humans that like chess at their own level of gameplay - since ultimately the only true purpose of chess is to entertain humans, and a machine that plays perfectly is actually completely unfun to play.

tucnak 7 days ago | parent [-]

To be fair, if you take your agument from the last paragraph, i.e. that the function of chess as the game is to entertain, your earlier argument re: min-max doesn't really stand, does it? I think, you're right that chess probably quite interesting in terms of abstract maths, like surely there are ways to represent the pawns (pawn structures?) as well as the pieces (knights, bishops, etc.) in terms of some supersymmetry. However, it doesn't seem like much progress has been made in this area academically since the 20th century. It may be helpful to tap into AlphaFold and related results for interpretability! Stockfish has incorporated some probabilistic programming (neural network-based) but it's comparatively small-scaled, and behind SOTA of the bleeding-edge Transformer architectures (in terms of interpretability, nonetheless!) Surely, if we can't get supersymmetries in some complex forms, we could get ahead with the modern interpretability and RL techniques. Given the appropriate knowledge representation, by combining self-play with known playing sequences and behaviours by forcing the model into known lines, & perhaps partitioning by player styles so there's incentives for the model to learn some style feature, it should be possible for it to learn what we refer to as the essence of the game, i.e. archetypal human playing styles comfortably. Using insights learned from interpretability, it should be possible to further influence the model during inference.

If they were to get to that point, we could say that chess would be solved...

jibal 7 days ago | parent [-]

> I think, you're right that chess probably quite interesting in terms of abstract maths

They said it's not interesting.

> like surely there are ways to represent the pawns (pawn structures?) as well as the pieces (knights, bishops, etc.) in terms of some supersymmetry.

No, absolutely not. The confusion between pawns and pawn structures highlights how completely off base this is. The attack vectors of the various pieces are easily represented, but there's no generalization to "knight structures" or "bishop structures", and "supersymmetry" is completely irrelevant to chess.

> If they were to get to that point, we could say that chess would be solved...

No, solving a game has a specific meaning and that's not it.

tucnak 7 days ago | parent [-]

I wouldn't discount symmetries in chess on the account of algebraic topology existing.

Once in a while, new maths is produced in one area, and somebody else would pick it up to apply in a completely different domain. I'm a bit familiar with chess engines, & their source code. The board representation is just cache-friendly data structures of a 8x8 board, and heuristics on top. This is only a statement on our understanding: heuristic, or probabilistic (see AlphaZero), but it doesn't mean a more fundamental, symmetrical structure doesn't exist. Rubik's cube was famously solved by fundamental insights from group theory. Now, chess is probably, but not definitely, radically harder problem in terms of computational complexity, let alone because there's an adversary and all of game logic applies. However, we see it all the time in cryptanalysis where new insights from maths people broke some very cool constructions out-of not much but some bad priors.

Pure min-max search is inherently suboptimal, if your goal is understanding the game. AlphaZero, and to a lesser extent Leela et al. has shown this, and indeed the players incorporated all these ideas shortly thereafter. Of course, old tricks no longer provide advantage now that they're known, but then again—it doesn't mean better interpretations in commentary, player training, etc. are not possible. None of the existing engines, heuristic, probabilistic, heuristic and probabilistic, are so far (a) bringing new maths to help apply new maths to chess positions, (b) bringing new game-representations that would lend better to interpretability during inference.

To truly solve chess, in a given engine both (a) and (b) must suffice.

To get +EV from some intricate line analysed as deep as reasonable in the allotted time is not to bring you any closer in understanding the game. You could tell that the line works, of course, but that would only be possible on account of you already having found the line in the first place! However, for the line to be possible, and profitable, something in the structure of your prior position has to allow it. Or otherwise predict it.

jibal 7 days ago | parent [-]

> I wouldn't discount symmetries in chess

I didn't ... the word used was "supersymmetry". And likening chess to a Rubik's cube or cryptanalysis is just silly ... silly enough that I'm going to stop commenting. But:

> None of the existing engines, heuristic, probabilistic, heuristic and probabilistic, are so far (a) bringing new maths to help apply new maths to chess positions, (b) bringing new game-representations that would lend better to interpretability during inference.

Sigh. The people developing chess engines are not idiots, and are strongly motivated to find the most effective algorithms.

mysecretaccount 7 days ago | parent | prev | next [-]

> We should have solved chess already. > We should be aiming to solve chess, but we are not even trying.

We are trying and we should not expect to solve it because the search space is massive. There are more legal games of chess than atoms in the universe.

throw-qqqqq 7 days ago | parent | next [-]

> There are more legal games of chess than atoms in the universe

I’ll make an even stronger assertion:

There are more legal chess games in 40 moves or less, than there are atoms in the universe.

https://en.wikipedia.org/wiki/Shannon_number

https://skeptics.stackexchange.com/questions/4165/are-there-...

NickC25 7 days ago | parent | prev | next [-]

True, but we can at the very least teach a computer to prune what are effectively unreachable positions or openings that won't happen at a basic level let alone high level or GM level play.

For example, I don't think I've ever seen or will ever see: 1. a3 a6 2. h3 h6

The computer should be given the logic in which to solve chess by telling it right off the bat to not search through certain lines or openings because they are effectively useless.

jibal 7 days ago | parent [-]

You think no one has done that? Aside from explicitly encoding book lines, NN weights developed over millions of games steer chess engines away from useless lines.

But if you don't explore a line, even if you believe it's effectively useless, then you haven't solved chess.

P.S.

> I don't think I've ever seen or will ever see: 1. a3 a6 2. h3 h6

I have. Children and other beginners often do play like this. And even at GM levels we've seen nonsense like the Bongcloud and, marginally less nonsensically, the Cow opening.

NickC25 7 days ago | parent [-]

>But if you don't explore a line, even if you believe it's effectively useless, then you haven't solved chess.

That's actually a good point because ideas that modern GMs found to be useless (in particular, locking a wing down via aggressive early wing play) have actually found a new home in AlphaZero's play.

What might be completely dumb from an early move (say, aggressively pushing the F pawn from the start) might provide to be an incredible opening, but humans are too stupid to be able to execute it perfectly.

GistNoesis 7 days ago | parent | prev | next [-]

This is the precisely the point I am trying to make :

A candidate solution function can take a very small finite space : much smaller than the number of gamestates. And we can invalidate a candidate solution by finding a single counter example.

Current chess engine can't be invalidated quickly : they are not trying to solve chess. They are not falsifiable. They are not attempting to precisely define the frontier which is what I think where remaining efforts should be made.

We are just trying to encode the frontier like we would with a mandelbrot fractal.

Proving that a solution is valid is harder than finding a solution. Here I am suggesting we find the solution first.

Proving can also be done without exploring all legal chess positions. For example you can use branch and bound to cull vast amount of state space once you have your function. You just have to partition the state space and prove on each partition that the function is constant, for example when one side has 8 queen and the other has only a king with freedom of movement the position is winnable and you don't have to check every gamestate.

I am stating that there is a good chance that by searching for a function we will stumble upon one which has no counter example, because precisely the complexity of chess is not infinite (unproven conjecture (no-turing completeness of adversarial chess) ). We should be looking for it.

charcircuit 7 days ago | parent | prev [-]

>We are trying

Who is currently trying? You make it sound like people think it is impossible and hence would not try.

>There are more legal games of chess than atoms in the universe.

This only matters if you are doing a pure brute force. Also comparing exponential numbers to a count is an unfair comparison where exponents easily win.

WJW 7 days ago | parent | prev | next [-]

> If the complexity of chess is finite this is possible.

This is like saying the distance to Mars is finite so it should be possible to just go and live there. It's not theoretically impossible, but at the time it is practically impossible. There are real engineering challenges between here and there that have not yet been solved and from the looks of it, it will be at least several more decades before we get there.

In your example, you gloss over the construction of a "finite time oracle" by just saying "just train it until it is perfect". If humanity knew how to do that we could solve a whole other (more interesting) set of problems, but we don't.

GistNoesis 7 days ago | parent [-]

>"just train it until it is perfect"

Yes that's exactly the problem with current approach based on a "valuation" function.

They are not trying to aim for perfection, and therefore cannot make progress anymore.

To progress you must precisely define what is the frontier : an evaluation of 0.1 is not resolved to one of "white win", "draw", "white lose" which they theoretically must be. They are not "committing" to anything.

To train such a network to perfection you must avoid training your neural network for the "average" game state, but rather also train for "hard mining samples", game states which define the frontier.

Find a candidate, find a violation, add to dataset of training examples, Retrain to perfection on a growing dataset, (or a generator of hard positions) to find a new candidate and Loop.

WJW 7 days ago | parent [-]

So what makes you think it is possible to precisely define such a frontier? And why should such a thing, if it is possible at all, be 1. doable by humans and 2. doable with the amount of energy and computing power available to us within the coming couple of decades?

taneq 7 days ago | parent | prev | next [-]

"We should have solved Busy Beaver, if the complexity of BB(n) is finite this is possible."

I mean yeah, chess isn't THAT bad but still not directly tractable and besides, brute forcing it is boring.

ARandumGuy 7 days ago | parent | prev [-]

I've got to ask, do you play much chess? Because this post reads like you don't understand much about chess.

The issue with "solving" chess isn't that there isn't an objectively best move in every position. The issue is that calculating that move is functionally impossible for most positions. That's because chess gets exponentially more complicated the more pieces there are on the board. For example, there are around 26 million positions with 5 or fewer pieces, and over 3.7 billion positions with 6 or fewer pieces.

And most of those positions are distinct. This isn't like a Rubik's cube, where there are a lot of functionally identical positions. Any experienced chess player can tell you that a single piece placement can be the difference between a winning position, and a losing position.

And this complexity is what I love about chess! I love the fact that I can enter positions that no one has ever encountered before just by playing some chess online. I love the fact that deep analysis is possible, but that the sheer size of the possibility space means we can never truly solve chess. Chess strikes a perfect balance of complexity. Any simpler, and evaluating the best move would be too easy. Any more complicated, and evaluation becomes so difficult that it's hardly worth trying.

Which isn't to say that we can't build computers that are very good at chess. A person hasn't beaten a top computer in decades. Magnus Carlson is probably the greatest chess player to have ever lived, and you can run software on your phone that could easily beat him. But there's a wide gulf between "can beat every human alive" and "plays objectively perfect chess."