▲ | smokel a day ago | ||||||||||||||||||||||
Minor nitpick: it did not use preprogrammed rules for scanning through the search tree, but it does use preprogrammed rules to enforce that no illegal moves are made during play. | |||||||||||||||||||||||
▲ | hulium a day ago | parent [-] | ||||||||||||||||||||||
During play, yes, obviously you need an implementation of the game to play it. But in its planning tree, no: > MuZero only masks legal actions at the root of the search tree where the environment can be queried, but does not perform any masking within the search tree. This is possible because the network rapidly learns not to predict actions that never occur in the trajectories it is trained on. | |||||||||||||||||||||||
|