| ▲ | throwaw12 5 hours ago | |||||||||||||||||||||||||||||||||||||
How is that Meta spent so much money for talent and hardware, but the model barely matches Opus 4.6? Especially, looking at these numbers after Claude Mythos, feels like either Anthropic has some secret sauce, or everyone else is dumber compared to the talent Anthropic has | ||||||||||||||||||||||||||||||||||||||
| ▲ | strulovich 5 hours ago | parent | next [-] | |||||||||||||||||||||||||||||||||||||
Meta did a bunch of mistakes, and look like Zuckerberg spent a lot of money on talent and made big swings to change it (that happened about a year ago) I think it’s unrealistic to expect them to come back from that pit to the top in one year, but I wouldn’t rule them out getting there with more time. That’s a possible future. They have the money and Zuckerberg’s drive at the helm. It can go a long way. | ||||||||||||||||||||||||||||||||||||||
| ▲ | solenoid0937 5 hours ago | parent | prev | next [-] | |||||||||||||||||||||||||||||||||||||
It's benchmaxxed. If they actually matched Opus 4.6 on such a short timeline, it would have been mighty impressive. (Keep in mind this is a new lab and they are prohibited from doing distills.) | ||||||||||||||||||||||||||||||||||||||
| ||||||||||||||||||||||||||||||||||||||
| ▲ | coffeebeqn 5 hours ago | parent | prev | next [-] | |||||||||||||||||||||||||||||||||||||
Matching Opus 4.6 would be pretty good? It’s the SOTA actually available model | ||||||||||||||||||||||||||||||||||||||
| ||||||||||||||||||||||||||||||||||||||
| ▲ | impulser_ 5 hours ago | parent | prev | next [-] | |||||||||||||||||||||||||||||||||||||
It's not even on par with Sonnet. It's on par with open source models and it not even open source and sit behind a private preview API. Might as well not release anything. | ||||||||||||||||||||||||||||||||||||||
| ▲ | CuriouslyC 2 hours ago | parent | prev | next [-] | |||||||||||||||||||||||||||||||||||||
Anthropic has just been focused on coding/terminal work longer mostly, and their PRO tier model is coding focused, unlike the GPT and Gemini pro tier models which have been optimized for science. Their whole "training the LLM to be a person" technique probably contributes to its pleasant conversational behavior, and making its refusals less annoying (GPT 5.2+ got obnoxiously aligned), and also a bit to its greater autonomy. Overall they don't have any real moat, but they are more focused than their competition (and their marketing team is slaying). | ||||||||||||||||||||||||||||||||||||||
| ||||||||||||||||||||||||||||||||||||||
| ▲ | wotsdat 5 hours ago | parent | prev | next [-] | |||||||||||||||||||||||||||||||||||||
[dead] | ||||||||||||||||||||||||||||||||||||||
| ▲ | username223 5 hours ago | parent | prev | next [-] | |||||||||||||||||||||||||||||||||||||
Facebook is working with the talent that can’t find a job at some other company. It doesn’t surprise me they ship mediocrity. | ||||||||||||||||||||||||||||||||||||||
| ▲ | zozbot234 5 hours ago | parent | prev [-] | |||||||||||||||||||||||||||||||||||||
> has some secret sauce Yup, it's called test-time compute. Mythos is described as plenty slower than Opus, enough to seriously annoy users trying to use it for quick-feedback-loop agentic work. It is most properly compared with GPT Pro, Gemini DeepThink or this latest model's "Contemplating" mode. Otherwise you're just not comparing like for like. | ||||||||||||||||||||||||||||||||||||||
| ||||||||||||||||||||||||||||||||||||||