Remix.run Logo
MostlyStable 9 hours ago

Why do you think they are bragging? Anthropic has long been the company to give us by far the most in-depth information about their models, both positive and negative. I read this as them just stating a fact about this model that users would want to know.

organsnyder 9 hours ago | parent | next [-]

I'm absolutely certain that their marketing team has input on (if not owning) these announcements.

gallerdude 9 hours ago | parent | next [-]

Of course. But is it really impossible that Dario’s directive to the marketing team is “try not to make us look bad, but also be honest about our models’ capabilities, so people can stay informed”?

MostlyStable 9 hours ago | parent | prev [-]

I find it interesting how two different directly opposed messages seem to have both been interpreted as being nothing but marketing speak.

MallocVoidstar 9 hours ago | parent | prev | next [-]

The preceding sentence is

>Our safety assessments found that Sonnet 5 shows an overall lower rate of undesirable behaviors than Sonnet 4.6, and is generally safer to use in agentic contexts.

which is obviously painting that as a good thing. So reading the next sentence as "in other good news" is reasonable.

MostlyStable 9 hours ago | parent [-]

While I'm still not sure I would characterize that as bragging, you're right that that is a fair interpretation. However, another Fair interpretation of that is something along the lines of "the downside or cost of this positive thing is this following negative thing."

satvikpendem 9 hours ago | parent | prev [-]

Anthropomorphic, most in-depth? That's laughable given how closed down they've been over the years. If you want in-depth, DeepSeek actually still publishes papers of their methods for anyone to implement leading to being by far the most cost efficient model provider for the performance.

MostlyStable 9 hours ago | parent [-]

I was talking about reporting on testing and capabilities. Yes, open models provide a greater amount of information about the development of the model and how to run it yourself, but I am quite confident that literally no AI company, open or closed, conducts and reports so thoroughly on testing about the capabilities of their models.