Remix.run Logo
usaar333 3 days ago

> Some sources mention that o3 scores 63.8 on SWE-bench, while Gemini 2.5 Pro scores 69.1.

It's the opposite. o3 scores higher

SweetSoftPillow 2 days ago | parent [-]

On SWE bench? Show your source.