Do we know this is true? Did Anthropic release the exact prompt they used to uncover these security vulnerabilities? Or did they use it, target it like a black hat hacker would and then make a marketing campaign around how Mythos is so incredible that its unsafe to share with the public?

▲

CodingJeebus 3 hours ago | parent [-]

100% this. We've seen enough model releases at this point to know that there hasn't been a single model rollout making bold claims about its capability that wasn't met with criticism after release.

The fact that Anthropic provides such little detail about the specifics of its prompt in an otherwise detailed report is a major sleight of hand. Why not release the prompt? It's not publicly available, so what's the harm?

We can't criticize the methods of these replication pieces when Anthropic's methodology boils down to: "just trust us."

	▲	gruez 3 hours ago \| parent [-]
		>We've seen enough model releases at this point to know that there hasn't been a single model rollout making bold claims about its capability that wasn't met with criticism after release. Examples? All I remember are vague claims about how the new model is dumber in some cases, or that they're gaming benchmarks.