Remix.run Logo
poormathskills 6 days ago

Go look at their past blog posts. OpenAI only ever benchmarks against their own models.

This is pretty common across industries. The leader doesn’t compare themselves to the competition.

christianqchung 6 days ago | parent | next [-]

Okay, it's common across other industries, but not this one. Here is Google, Facebook, and Anthropic comparing their frontier models to others[1][2][3].

[1] https://blog.google/technology/google-deepmind/gemini-model-...

[2] https://ai.meta.com/blog/llama-4-multimodal-intelligence/

[3] https://www.anthropic.com/claude/sonnet

poormathskills 6 days ago | parent [-]

Right. Those labs aren’t leading the industry.

comp_throw7 6 days ago | parent [-]

Confusing take - Gemini 2.5 is probably the best general purpose coding model right now, and before that it was Sonnet 3.5. (Maybe 3.7 if you can get it to be less reward-hacky.) OpenAI hasn't had the best coding model for... coming up on a year, now? (o1-pro probably "outperformed" Sonnet 3.5 but you'd be waiting 10 minutes for a response, so.)

oofbaroomf 6 days ago | parent | prev | next [-]

Leader is debatable, especially given the actual comparisons...

dimitrios1 6 days ago | parent | prev | next [-]

There is no uniform tactic for this type of marketing. They will compare against whomever they need to to suit their marketing goals.

kweingar 6 days ago | parent | prev | next [-]

That would make sense if OAI were the leader.

awestroke 6 days ago | parent | prev | next [-]

Except they are far from the lead in model performance

poormathskills 6 days ago | parent [-]

Who has a (publicly released) model that is SOTA is constantly changing. It’s more interesting to see who is driving the innovation in the field, and right now that is pretty clearly OpenAI (GPT-3, first multi-modal model, first reasoning model, ect).

swyx 6 days ago | parent | prev [-]

also sometimes if you get it wrong you catch unnecessary flak