Remix.run Logo
kenforthewin 3 hours ago

Nice work. I think games are a great way to benchmark AI, especially games that involve long term strategy. I recently built an agent harness for NetHack - https://glyphbox.app/ - like you I suspect that there's a lot you can do at the harness / tool level to improve performance with existing models.