Remix.run Logo
smokel 4 days ago

If I understand this correctly, it learns the rules of Sudoku by looking at 1,000 examples of (puzzle, solution) pairs. It is then able to solve previously unseen puzzles with 55% accuracy. If given millions of examples, it becomes almost perfect.

This is apparently without pretraining of any sort, which is kind of amazing. In contrast, systems like AlphaZero have the rules to go or chess built-in, and only learn the strategy, not the rules.

Off to their GitHub repository [1] to see this for myself.

[1] https://github.com/sapientinc/HRM

babel_ 4 days ago | parent | next [-]

AlphaZero may have the rules built in, but MuZero and the other follow-ups didn't. MuZero not only matched or surpassed AlphaZero, but it did so with less training, especially in the EfficientZero variant; notably also on the Atari playground.

gavmor 4 days ago | parent | next [-]

This is "The Bitter Lesson" of AI, no? "More compute beats clever algorithm."

adastra22 3 days ago | parent | next [-]

> MuZero not only matched or surpassed AlphaZero, but it did so with less training

Seems the opposite?

babel_ 3 days ago | parent | prev [-]

Quite the opposite, a clever algorithm needs less compute, and can leverage extra compute even more.

gavmor 3 days ago | parent [-]

Apologies, "clever" is a poor paraphrase of "domain-specific", or "methods that leveraged human understanding."[0]

0. http://www.incompleteideas.net/IncIdeas/BitterLesson.html

smokel 4 days ago | parent | prev [-]

Thanks for pointing that out.

To be fair, MuZero only learns a model of the rules for navigating its search tree. To make actual moves, it gets a list of valid actions from the game engine, so at that level it does not learn the rules of the game.

(HRM possibly does the same, and could be in the same realm as MuZero. It probably makes a lot of illegal moves.)

smokel 4 days ago | parent | prev [-]

To follow up, after experimenting a bit with the source code:

1. Please, for the love of God, and for scientific reproducibility, specify library versions explicitly, and use pyproject.toml instead of an incomplete requirements.txt.

2. The 1,000 Sudoku examples are augmented with hand-coded permutation algorithms, so the actual input data set is more like 1,000,000 examples, not 1,000.

rudedogg 4 days ago | parent | next [-]

Do you have a fork or the changes? I might take a look, and python dependency hell on Sunday is no good

mkagenius 3 days ago | parent | prev [-]

> specify library versions explicitly

Sometimes even that is not helpful. It's a pain we have to deal with.

gavinray 3 days ago | parent [-]

How is it not helpful?

A dependency lock file with resolved versions for both direct and transient dependencies = reproducible build

blincoln 3 days ago | parent | next [-]

I don't know how common this is, but the fschat library maintainers went for at least a year without making an official release or updating the version number in their GitHub repo, so the only way to both have current code and a reproducible build (without just including the fschat library directly, of course) was to pin it to a particular GitHub commit hash, which would get you code that was current, but with the version number from 12+ months earlier.

fschat is pretty popular for LLM-related work, so I assume this is at least not unheard-of for other notable third-party libraries.

mkagenius 3 days ago | parent | prev [-]

I don't remember the exact scenario but it might have been related to the underlying python or some sys library being a little different and then the dependency lock not being compatible with it.