Remix.run Logo
vessenes 4 hours ago

This is an interesting document, in that it reads like a Claude Mythos model card that was hastily edited to be an Opus 4.7 model card.

I surmise that someone at the top put the Mythos release on hold, and the product team was told "ship this other interim step model instead. quickly."

I wonder if 4.7 will be seen as a net step-up in quality; there are some regressions noted in the document, and it's clearly substantially worse than Mythos, at least according to its own model card. Should be an interesting few months -- if I were at oAI I'd be rushing to get something out that's clearly better, and pressing for weakness here.

the13 4 hours ago | parent | next [-]

What makes you think that? "it reads like a Claude Mythos model card that was hastily edited to be an Opus 4.7 model card"

vessenes 4 hours ago | parent [-]

There are more mentions of Mythos than 4.6. Mythos results are nearly everywhere, and vastly exceed 4.7's capacity in almost every case. There are sections that report only research on Mythos, none on 4.7. E.g. user surveys about how beneficial Mythos is internally at Anthropic.

barneybooroo 3 hours ago | parent | prev [-]

Yeah, the section expanding on how they evaluated Mythos internally is a bit baffling considering how irrelevant it is.