Remix.run Logo
smokel 4 days ago

Thanks for pointing that out.

To be fair, MuZero only learns a model of the rules for navigating its search tree. To make actual moves, it gets a list of valid actions from the game engine, so at that level it does not learn the rules of the game.

(HRM possibly does the same, and could be in the same realm as MuZero. It probably makes a lot of illegal moves.)