Remix.run Logo
chc4 3 hours ago

It's really funny reading the thought processes, where most of the time the agent doesn't actually remember trivial things about the cards they or their opponent are playing (thinking they have different mana costs, have different effects, mix up their effect with another card). The fact they're able to take game actions and win against other agants is cute, but it doesn't inspire much confidence.

The agents also constantly seem to evaluate if they're "behind" or "ahead" based on board state, which is a weird way of thinking about most games and often hard to evalaute, especially for decks like control which card more about resources like mana and card advantage, and always plan on stabalizing late game.

GregorStocks 2 hours ago | parent [-]

You might be looking at really old games (meaning, like, Saturday) - I've made a lot of harness improvements recently which should make the "what does this card do?" hallucinations less common. But yeah, it still happens, especially with cheaper models - it's hard to balance "shoving everything they need into the context" against "avoid paying a billion dollars per game or overwhelming their short-term memory". I think the real solution here will be to expose more powerful MCP tools and encourage them to use the tools heavily, but most current models have problems with large MCP toolsets so I'm leaving that as a TODO for now until solutions like Anthropic's https://www.anthropic.com/engineering/code-execution-with-mc... become widespread.