| ▲ | kenforthewin 3 hours ago | |
Nice work. I think games are a great way to benchmark AI, especially games that involve long term strategy. I recently built an agent harness for NetHack - https://glyphbox.app/ - like you I suspect that there's a lot you can do at the harness / tool level to improve performance with existing models. | ||