Remix.run Logo
AnotherGoodName a day ago

I’ve been working on board game ai lately.

Fwiw nothing beats ‘implement the game logic in full (huge amounts of work) and with pruning on some heuristics look 50 moves ahead’. This is how chess engines work and how all good turn based game ai works.

I’ve tried throwing masses of game state data at latest models in pytorch. Unusable. It Makes really dumb moves. In fact one big issue is that it often suggests invalid moves and the best way to avoid this is to implement the board game logic in full to validate it. At which point, why don’t i just do the above scan ahead X moves since i have to do the hard parts of manually building the world model anyway?

One area where current ai is helping is on the heuristics themselves for evaluating best moves when scanning ahead. You can input various game states and whether the player won the game or not in the end to train the values of the heuristics. You still need to implement the world model and look ahead to use those heuristics though! When you hear of neural networks being used for go or chess this is where they are used. You still need to build the world model and brute force scan ahead.

One path i do want to try more: In theory coding assistants should be able to read rulebooks and dynamically generate code to represent those rules. If you can do that part the rest should be easy. Ie. it could be possible to throw rulebooks at ai and it play the game. It would generate a world model from the rulebook via coding assistants and scan ahead more moves than humanly possible using that world model, evaluating to some heuristics that would need to be trained through trial and error.

Of course coding assistants aren’t at a point where you can throw rulebooks at them to generate an internal representation of game states. I should know. I just spent weeks building the game model even with a coding assistant.

PeterStuer 14 hours ago | parent | next [-]

"Elephants don't play chess" ;)

You have a tiny, completely known, deterministic rule based 'world'. 'Reasoning' forwards for that is trivial.

Now try your approach for much more 'fuzzy', incomletely and ill defined environments, e.g. natural language production, and watch it go down in flames.

Different problems need different solutions. While current frontier llm's show surprising results in emergent shallow and linguistic reasoning, they are far away from deep abstract logical reasoning. A sota theorem prover otoh, can excel at that, but can still struggle to produce a coherent sentence.

I think most have always agreed that for certain tasks, an abstraction over which one can 'reason' is required. People differ in opinion over wether this faculty is to be 'crafted' in or wether it is possible to have it emerge implicitly and more robust from observations and interactions.

https://people.csail.mit.edu/brooks/papers/elephants.pdf

AnotherGoodName 5 hours ago | parent [-]

What seems bizarre though is that the language problem was fully solved first (where fully solved means AI can learn it through pure observation with no human intervention at all).

As in language today is learnt by basically throwing raw data at an LLM. Board games such as chess still require a human to manually build a world model for the state space search to work on. They are indeed totally different problems but it's still shocking to me which one was fully solved first.

tim333 2 hours ago | parent [-]

>Board games such as chess still require a human to manually build a world model for the state space search to work on

That's not so. Deepmind MuZero can learn most board games without even being told the rules.

smokel a day ago | parent | prev | next [-]

You probably know this, but things heavily depend on the type of board game you are trying to solve.

In Go, for instance, it does not help much to look 50 moves ahead. The complexity is way too high for this to be feasible, and determining who's ahead is far from trivial. It's in these situations where modern AI (reinforcement learning, deep neural networks) helps tremendously.

Also note that nobody said that using AI is easy.

AnotherGoodName a day ago | parent [-]

Alphago (and stockfish that another commenter mentioned) still has to search ahead using a world model. The AI training just helps with the heuristics for pruning and evaluation of that search.

The big fundamental blocker to a generic ‘can play any game’ ai is the manual implementation of the world model. If you read the alphago paper you’ll see ‘we started with nothing but an implementation of the game rules’. That’s the part we’re missing. It’s done by humans.

moyix a day ago | parent | next [-]

Note that MuZero did better than AlphaGo, without access to preprogrammed rules: https://en.wikipedia.org/wiki/MuZero

smokel a day ago | parent | next [-]

Minor nitpick: it did not use preprogrammed rules for scanning through the search tree, but it does use preprogrammed rules to enforce that no illegal moves are made during play.

hulium a day ago | parent [-]

During play, yes, obviously you need an implementation of the game to play it. But in its planning tree, no:

> MuZero only masks legal actions at the root of the search tree where the environment can be queried, but does not perform any masking within the search tree. This is possible because the network rapidly learns not to predict actions that never occur in the trajectories it is trained on.

https://arxiv.org/pdf/1911.08265

skywhopper 21 hours ago | parent [-]

That is exactly what the commenter was saying.

Zacharias030 16 hours ago | parent | next [-]

It is consistent with what the commenter was saying.

In any case, for Go - with a mild amount of expert knowledge - this limitation is most likely quite irrelevant unless in very rare endgame situations, or special superko setups, where a lack of moves or solutions push some probability to moves that look like wishful thinking.

I think this is not a significant limitation of the work (not that any parent claimed otherwise). MuZero is acting in an environment with prescribed actions, it’s just “planning with a learned model” and without access to the simulation environment.

—-

What I am less convinced by was the claim that MuZero reaches higher performance than previous AlphaZero variants. What is the comparison based on? Iso-flops, Iso-search depth, iso self play games, iso wallclock time? What would make sense here?

Each AlphaGo paper was trained on some sort of embarrassingly parallel compute cluster, but all included the punchlines for general audiences that “in just 30 hours” some performance level was reached.

gnfargbl 20 hours ago | parent | prev [-]

The more detailed clarification on what "preprogrammed rules" actually means in this case made the entire discussion significantly more clear to me. I think it was helpful.

CGamesPlay 14 hours ago | parent | prev [-]

This is true, and MuZero's paper notes that it did better with less computation than AlphaZero. But it still used about 10x more computation to get there than AlphaGo, which was "bootstrapped" with human expert moves. I think this is very important context to anyone who is trying to implement an AI for their own game.

smokel a day ago | parent | prev [-]

Implementing a world model seems to be mostly solved by LLMs. Finding one that can be evaluated fast enough to actually solve games is extremely hard, for humans and AI alike.

skywhopper 21 hours ago | parent [-]

What are you talking about?

smokel 11 hours ago | parent [-]

Optimization is harder than writing out the rules of a game.

For most board games, it is trivial to describe all possible next states, but it is not at all trivial to search through all of these to find the best action to take.

daxfohl a day ago | parent | prev | next [-]

Yeah, I can't even get them to retain a simple state. I've tried having them run a maze, but instead of giving them the whole maze up front, I have them move one step at a time, tell them which directions are open from that square and ask for the next move, etc.

After a few moves they get hopelessly lost and just start wandering back and forth in a loop. Even when I prompt them explicitly to serialize a state representation of the maze after each step, and even if I prune the old context so they don't get tripped up on old state representations, they still get flustered and corrupt the state or lose track of things eventually.

They get the concept: if I explain the challenge and ask to write a program to solve such a maze step-by-step like that, they can do that successfully first-try! But maintaining it internally, they still seem to struggle.

nomadpenguin a day ago | parent | next [-]

There are specialized architectures (the Tolman-Eichenbaum Machine)* that are able to complete this kind of task. Interestingly, once trained, their activations look strikingly similar to place and grid cells in real brains. The team were also able to show (in a separate paper) that the TEM is mathematically equivalent to a transformer.

* https://www.sciencedirect.com/science/article/pii/S009286742...

Mallowram 8 hours ago | parent [-]

[dead]

kqr 14 hours ago | parent | prev | next [-]

My experience in trying to get them to play text adventures[1] is similar. I had to prompt with very specific leading questions to give them a decent chance of even recognising the main objective after the first few steps.

[1]: https://entropicthoughts.com/getting-an-llm-to-play-text-adv...

warrenm a day ago | parent | prev | next [-]

>I've tried having them run a maze, but instead of giving them the whole maze up front, I have them move one step at a time, tell them which directions are open from that square and ask for the next move, etc.

Presuming these are 'typical' mazes (like you find in a garden or local corn field in late fall), why not have the bot run the known-correct solving algorithm (or its mirror)?

daxfohl a day ago | parent [-]

Like I said, they can implement the algorithm to solve it, but when forced to maintain the state themselves, either internally or explicitly in the context, they are unable to do so and get lost.

Similarly if you ask to write a Sudoku solver, they have no problem. And if you ask an online model to solve a sudoku, it'll write a sudoku solver in the background and use that to solve it. But (at least the last time I tried, a year ago), if you ask to solve step-by-step using pure reasoning without writing a program, they start spewing out all kinds of nonsense (but humorously cheat: they'll still spit out the correct answer at the end).

prewett 18 hours ago | parent | next [-]

That’s because there are lots of maze-solving algorithms on the web, so it’s easy to spit one back at you. But since they don’t actually understand how solve a maze, or even apply an algorithm one step at a time, it doesn’t work well.

warrenm 19 minutes ago | parent | prev | next [-]

you do not need to remember state with the simplest solver:

- place your right hand on the right wall - walk forward, never letting your hand leave the wall - arrive at the exit

yes, you travel many dead ends along the way

but you are guaranteed to get to the end of a 'traditional' maze

adventured a day ago | parent | prev [-]

So if you push eg Claude Sonnet 4 or Opus 4.1 into a maze scenario, and have it record its own pathing as it goes, and then refresh and feed the next Claude the progress so far, would that solve for the inability to maintain long duration context in such maze cases?

I make Claude do that on every project. I call them Notes for Future Claude and have it write notes for itself because of how quickly context accuracy erodes. It tends to write rather amusing notes to itself in my experience.

daxfohl a day ago | parent | next [-]

This was from a few months ago, so things may be different now. I only used OpenAI, and the o3 model did by far the best (gpt-4o's performance was equivalent on the basic scenario when I had it just move one move at a time (which, it was still pretty good, all considered), but when I started having it summarize state and such, o3 was able to use that to improve performance, whereas 4o actually got worse).

But yeah, that's one of the things I tried. "Your turn is over. Please summarize everything you have learned about the maze so someone else can pick up where you left off". It did okay, but it often included superfluous information, it sometimes forgot to include current orientation (the maze action options were "move forward", "turn right", "turn left", so knowing the current orientation was important), and it always forgot to include instructions on how to interpret the state: in particular, which absolute direction corresponded to an increase or decrease of which grid index.

I even tried to coax it into defining a formal state representation and "instructions for an LLM to use it" up-front, to see if it would remember to include the direction/index correspondence, but it never did. It was amusing actually; it was apparent it was just doing whatever I told it and not thinking for itself. Something like

"Do you think you should include a map in the state representation? Would that be useful?"

"Yes, great idea! Here is a field for a map, and an algorithm to build it"

"Do you think a map would be too much information?"

"Yes, great consideration! I have removed the map field"

"No, I'm asking you. You're the one that's going to use this. Do you want a map or not?"

"It's up to you! I can implement it however you like!"

Mars008 16 hours ago | parent | prev [-]

> have it write notes for itself because of how quickly context accuracy erodes. It tends to write rather amusing notes to itself in my experience.

Just wondering would it help to ask it to write to someone else? Because model itself wasn't in its training set, this may be confusing.

yberreby 17 hours ago | parent | prev [-]

It took me a second to realize you were talking about prompting a LLM. This is fundamentally different from what the parent is doing. "AI" is so much more than "talking to a pretrained LLM."

coeneedell a day ago | parent | prev | next [-]

IIRC the rules system for magic the gathering: Arena is generated by a sort of compiler fed the rules. You might not even need a modern coding assistant to build out something reasonable in a DSL that is perfect, then have people (or an LLM after fine tuning) transforms rule books into the DSL.

Crespyl 20 hours ago | parent [-]

They have an interesting write up here: https://magic.wizards.com/en/news/mtg-arena/on-whiteboards-n...

There's a lisp variant involved, and IIRC even a parser that reads the card text to auto-generate the rules code for most of the cards.

fennecbutt 4 hours ago | parent [-]

Tho tbf there are plenty of cards with what are essentially footnotes. They say reading the card explains the card but that's not always the case, sometimes there's nuance because mtg has so many fucking crazy interactions and the whole stack thing.

I haven't played in a month or two but now I'm getting that itch again aha. When's bloomburrow 2, enough of this UB crap.

MachineBurning 8 hours ago | parent | prev | next [-]

> Fwiw nothing beats ‘implement the game logic in full (huge amounts of work) and with pruning on some heuristics look 50 moves ahead’. This is how chess engines work and how all good turn based game ai works.

For board games this is mostly true. For turn based games in general, it is not. It's certainly not true to say "all good turn based game ai" works like this.

Turn based games where multiple "moves" are allowed per turn can very quickly have far too many branches to look ahead more than a very small number of turns. On board games you might have something like Warhammer, or Blood Bowl where there are many possible actions and order of actions within a turn matters.

For computer games you may Screeps [2] or the Lux multi-agent AI competitions [3] which both have multiple "units" per player, where each unit may have multiple possible actions. You can easily reach a combinatorial explosion where any attempt at modeling future states of the world fails and you have to fall back on pure heuristics.

[1]https://en.wikipedia.org/wiki/Blood_Bowl

[2]https://screeps.com/

[3]https://www.kaggle.com/competitions/lux-ai-season-2

bubblyworld a day ago | parent | prev | next [-]

Something to consider is that while it's really hard to implement a decent NN-based algorithm like AlphaZero for your game, you get the benefit that model checkpoints give you a range of skill levels to play against as you train it.

Handicapping traditional tree search produces really terrible results, imo. It's common for weak chess engines to be weak for stupid reasons (they just hang pieces, make random unnatural moves, miss blatant threats etc). Playing weak versions of Leela chess really "feels" like a (bad) human opponent by contrast.

Maybe the juice isn't worth the squeeze. It's definitely a ton of work to get right.

deepsquirrelnet 21 hours ago | parent | prev | next [-]

> I’ve tried throwing masses of game state data at latest models in pytorch. Unusable. It Makes really dumb moves. In fact one big issue is that it often suggests invalid moves and the best way to avoid this is to implement the board game logic in full to validate it.

It sounds like you need RL. You could try setting up some reward functions with evaluators. I’m not sure what your architecture is, but something to try.

robertlagrant 21 hours ago | parent | prev | next [-]

How does this experience translate to non-turn based games? Alphastar presumably is doing something other than searching all the possible moves. Why would whatever it does not translate to turn-based?

ChaitanyaSai 11 hours ago | parent | prev | next [-]

Interesting! Documenting this anywhere?

red75prime a day ago | parent | prev | next [-]

It would be nice if you could train a decent model on a $1000 (or so) budget, but for now it seems unlikely.

GaggiX a day ago | parent | prev | next [-]

>This is how chess engines work

All strongest chess engine have at least one neural network to evaluate positions, including Stockfish, and this impact the searching window.

>how all good turn based game ai works

That's not really true, just think of Go.

skywhopper 21 hours ago | parent [-]

??? Chess engines and Go engines have as a baseline a world model of the state of the game and what moves are legal.

GaggiX 20 hours ago | parent [-]

>Fwiw nothing beats ‘implement the game logic in full (huge amounts of work) and with pruning on some heuristics look 50 moves ahead’. This is how chess engines work and how all good turn based game ai works.

Just read the parent comment.

jjk7 a day ago | parent | prev [-]

Interesting the parallels between LLM development and psychology & spirituality.

To have a true thinking, you need an internal adversary challenging thoughts and beliefs. To look 50 moves ahead, you need to simulate the adversary's moves... Duality