Remix.run Logo
kolinko 5 days ago

I’m missing from the article two things:

- testing prompt (were llms instructed to progress in game, as opposed to just explore — the author said smarter llms were more likely to explore)

- benchmark with humans