| ▲ | rcarmo 3 hours ago | |
I don't think this is a good "benchmark" anymore. It's probably on everyone's training set by now. | ||
| ▲ | staticassertion 3 hours ago | parent [-] | |
I think it could still be an interesting benchmark. Like, assuming AI companies are genuinely trying to solve this pelican problem, how well do they solve it? That seems valid, and the assumption here is that the approach they take could generalize, which seems plausible. | ||