Remix.run Logo
strken 5 hours ago

Check out Re-Bench and HCAST.

The tasks are obviously all of the form "Go do this, and if you get the following output you passed". Setting up a web server apparently takes 15 minutes for a human, which is news to me since I'm able to search for https://gist.github.com/willurd/5720255, find the python one-liner, and copy it within about ten seconds.

Anyway, this is cool but it does not mean Claude can perform any human tasks that take less than 8 hours and are within its physical capabilities.