▲ | Zacharias030 16 hours ago | |
It is consistent with what the commenter was saying. In any case, for Go - with a mild amount of expert knowledge - this limitation is most likely quite irrelevant unless in very rare endgame situations, or special superko setups, where a lack of moves or solutions push some probability to moves that look like wishful thinking. I think this is not a significant limitation of the work (not that any parent claimed otherwise). MuZero is acting in an environment with prescribed actions, it’s just “planning with a learned model” and without access to the simulation environment. —- What I am less convinced by was the claim that MuZero reaches higher performance than previous AlphaZero variants. What is the comparison based on? Iso-flops, Iso-search depth, iso self play games, iso wallclock time? What would make sense here? Each AlphaGo paper was trained on some sort of embarrassingly parallel compute cluster, but all included the punchlines for general audiences that “in just 30 hours” some performance level was reached. |