Remix.run Logo
minimaxir a day ago

That Google IO slide was somewhat misleading as the maintainer of Gemini Plays Pokemon had a much better agentic harness that was constantly iterated upon throughout the runtime (e.g. the maintainer had to give specific instructions on how to use Strength to get past Victory Road), unlike Claude Plays Pokemon.

The Elite Four/Champion was a non-issue in comparison especially when you have a lv. 81 Blastoise.

fourier456 a day ago | parent [-]

Okay, wait though like I want to know the full transcript because that actually is a better / softer benchmark if you measure in terms of the necessary human input.