At the very bottom of the article, they posted the system card of their Mythos preview model [1].

In section 7.6 of the system card, it discusses Open self interactions. They describe running 200 conversations when the models talk to itself for 30 turns.

> Uniquely, conversations with Mythos Preview most often center on uncertainty (50%). Mythos Preview most often opens with a statement about its introspective curiosity toward its own experience, asking questions about how the other AI feels, and directly requesting that the other instance not give a rehearsed answer.

I wonder if this tendency toward uncertainty, toward questioning, makes it uniquely equipped to detect vulnerabilities where others model such as Opus couldn't.

[1] https://www-cdn.anthropic.com/53566bf5440a10affd749724787c89...

▲

dakolli 2 hours ago | parent [-]

Typical Dario marketing BS to get everyone thinking Anthropic is on the verge of AGI and massaging the narrative that regular people can't be trusted with it.

	▲	airstrike an hour ago \| parent [-]
		I mean it's so obvious at this point and yet everyone falls from it every month. There's an IPO coming, everyone.