Remix.run Logo
maxall4 2 hours ago

Indeed, according to METR, Mythos only achieved an 80% success rate with 3 hour tasks. https://metr.org/time-horizons/

jwood27 2 hours ago | parent [-]

Those are tasks that would take a human 3 hours to complete, not tasks that the model works on for 3 hours.

jadar 38 minutes ago | parent [-]

That’s even smaller then!