Remix.run Logo
danielvinson 2 hours ago

As a former competitive MtG player this is really exciting to me.

That said, I reviewed a few of the Legacy games (the format I'm most familiar with and also the hardest by far), and the level of play was so low that I don't think any of the results are valid. It's very possible for Legacy they would need some assistance for playing Blue decks, but they seem to not be able to know the most basic of concepts - Who's the beatdown?.

IMO the most important pars of current competitive Magic is mulligans and that's something an LLM should be extremely good at but none of the games I'm seeing had either player starting with less than 7 cards... in my experience about 75% of games in Legacy have at least one player mulligan their opener.

GregorStocks an hour ago | parent [-]

Yeah, the intention here is not to answer "which deck is best" - the standard of play is nowhere near high enough for that. It's meant as more of a non-saturated benchmark for different LLM models, so you can say things like "Grok plays as well as a 7-year-old, whereas Opus is a true frontier model and plays as well as a 9-year-old". I'm optimistic that with continued improvements to the harness and new model releases we can get to at least "official Pro Tour stream commentator" skill levels within the next few years.