Remix.run Logo
lacoolj 21 hours ago

lol I love how OpenAI just straight up doesn't compare their model to others on these release pages. Basically telling us they know Gemini and Opus are better but they don't want to draw attention to it

qwesr123 21 hours ago | parent | next [-]

Not sure why they don't compare with others, but they are actually leading on the benchmarks they published. See here (bottom) for a chart comparing to other models: https://marginlab.ai/blog/swe-bench-deep-dive/

whimsicalism 20 hours ago | parent | next [-]

is swe-bench saturated? or they switch to swe-bench pro because...?

Mkengin 17 hours ago | parent [-]

At least on swe-rebench it does pretty well: https://swe-rebench.com/

mistercheph 20 hours ago | parent | prev [-]

It's like apple, they just don't want users or anyone to even be thinking of their competitors, the competition doesn't exist, it's not relevant.

dbbk 20 hours ago | parent | prev [-]

This was the one thing I scanned for. No comparison against Opus. See ya.

Mkengin 17 hours ago | parent [-]

Though this Codex version isnt on the leaderboard, GPT-5.2-Medium already seems to be a bit better than Opus 4.5: https://swe-rebench.com/

gizmodo59 16 hours ago | parent [-]

Is that your website or something? You keep promoting it