Remix.run Logo
OwenCR 4 hours ago

Sadly this benchmark removes the part of MTG that is most interesting: the opponent(s). Without opponents you simply don't have a game. You just have a rules engine - quite boring!

I think I object more to the decks used in testing than the machines' decisions. I do have nit picks though: This hand is quite poor and should be mulliganned: https://app.mtgautodeck.com/public/benchmarks/4bd9955b-ebe1-.... The poor runout reinforces this decision.

This project is cool though, props for making it!

CallumFerg 4 hours ago | parent [-]

Admittedly, the mulligan phase system prompt is the weakest part of the project. I had to add heuristics to stop the LLMs from mulliganing down to just a few cards looking for a perfect hand. The scoring for the benchmark is mostly based on if the LLM could complete legal turns, not good turns.

https://github.com/CallumFerguson/mtg-auto-deck/blob/a877c08...