| ▲ | CamperBob2 5 hours ago | |
Without reading the .pdf, I tried the first game it gave me, at https://arcprize.org/tasks/ls20, and I couldn't begin to guess what I was supposed to do. Not sure what this benchmark is supposed to prove. Edit: Having messed around with it now (and read the .pdf), it seems like they've left behind their original principle of making tests that are easy for humans and hard for machines. I'm still not convinced that a model that's good at these sorts of puzzles is necessarily better at reasoning in the real world, but am open to being convinced otherwise. | ||
| ▲ | WarmWash 5 hours ago | parent | next [-] | |
The goal is to learn the rules, and then use that to win. If you mess around a little bit, you will figure it out. There are only a few rules. | ||
| ▲ | szatkus 5 hours ago | parent | prev [-] | |
> Only environments that could be fully solved by at least two human participants (independently) were considered for inclusion in the public, semi-private and fully-private sets. Apparently those games supposed to be hard. | ||