| ▲ | maxall4 2 hours ago | |||||||
Indeed, according to METR, Mythos only achieved an 80% success rate with 3 hour tasks. https://metr.org/time-horizons/ | ||||||||
| ▲ | jwood27 2 hours ago | parent [-] | |||||||
Those are tasks that would take a human 3 hours to complete, not tasks that the model works on for 3 hours. | ||||||||
| ||||||||