Remix.run Logo
smokel a day ago

You probably know this, but things heavily depend on the type of board game you are trying to solve.

In Go, for instance, it does not help much to look 50 moves ahead. The complexity is way too high for this to be feasible, and determining who's ahead is far from trivial. It's in these situations where modern AI (reinforcement learning, deep neural networks) helps tremendously.

Also note that nobody said that using AI is easy.

AnotherGoodName a day ago | parent [-]

Alphago (and stockfish that another commenter mentioned) still has to search ahead using a world model. The AI training just helps with the heuristics for pruning and evaluation of that search.

The big fundamental blocker to a generic ‘can play any game’ ai is the manual implementation of the world model. If you read the alphago paper you’ll see ‘we started with nothing but an implementation of the game rules’. That’s the part we’re missing. It’s done by humans.

moyix a day ago | parent | next [-]

Note that MuZero did better than AlphaGo, without access to preprogrammed rules: https://en.wikipedia.org/wiki/MuZero

smokel a day ago | parent | next [-]

Minor nitpick: it did not use preprogrammed rules for scanning through the search tree, but it does use preprogrammed rules to enforce that no illegal moves are made during play.

hulium a day ago | parent [-]

During play, yes, obviously you need an implementation of the game to play it. But in its planning tree, no:

> MuZero only masks legal actions at the root of the search tree where the environment can be queried, but does not perform any masking within the search tree. This is possible because the network rapidly learns not to predict actions that never occur in the trajectories it is trained on.

https://arxiv.org/pdf/1911.08265

skywhopper 21 hours ago | parent [-]

That is exactly what the commenter was saying.

Zacharias030 16 hours ago | parent | next [-]

It is consistent with what the commenter was saying.

In any case, for Go - with a mild amount of expert knowledge - this limitation is most likely quite irrelevant unless in very rare endgame situations, or special superko setups, where a lack of moves or solutions push some probability to moves that look like wishful thinking.

I think this is not a significant limitation of the work (not that any parent claimed otherwise). MuZero is acting in an environment with prescribed actions, it’s just “planning with a learned model” and without access to the simulation environment.

—-

What I am less convinced by was the claim that MuZero reaches higher performance than previous AlphaZero variants. What is the comparison based on? Iso-flops, Iso-search depth, iso self play games, iso wallclock time? What would make sense here?

Each AlphaGo paper was trained on some sort of embarrassingly parallel compute cluster, but all included the punchlines for general audiences that “in just 30 hours” some performance level was reached.

gnfargbl 20 hours ago | parent | prev [-]

The more detailed clarification on what "preprogrammed rules" actually means in this case made the entire discussion significantly more clear to me. I think it was helpful.

CGamesPlay 14 hours ago | parent | prev [-]

This is true, and MuZero's paper notes that it did better with less computation than AlphaZero. But it still used about 10x more computation to get there than AlphaGo, which was "bootstrapped" with human expert moves. I think this is very important context to anyone who is trying to implement an AI for their own game.

smokel a day ago | parent | prev [-]

Implementing a world model seems to be mostly solved by LLMs. Finding one that can be evaluated fast enough to actually solve games is extremely hard, for humans and AI alike.

skywhopper 21 hours ago | parent [-]

What are you talking about?

smokel 11 hours ago | parent [-]

Optimization is harder than writing out the rules of a game.

For most board games, it is trivial to describe all possible next states, but it is not at all trivial to search through all of these to find the best action to take.