METR can barely measure Claude Mythos – 50% task horizon now exceeds 16 hours

	▲	METR can barely measure Claude Mythos – 50% task horizon now exceeds 16 hours(hugonomy.com)
		1 points by GlyphWeaver_a 11 hours ago \| 2 comments

	▲	overthinker_jp 10 hours ago \| parent \| next [-]
		Capability benchmarks may become less meaningful once agents operate across long execution horizons with external tools and permissions. The governance problem starts shifting toward execution boundaries and observability.
	▲	GlyphWeaver_a 11 hours ago \| parent \| prev [-]
		[dead]