| ▲ | jkelleyrtp 4 hours ago | ||||||||||||||||
claude swe-bench is 80.8 and codex is 56.8 Seems like 4.6 is still all-around better? | |||||||||||||||||
| ▲ | gizmodo59 4 hours ago | parent | next [-] | ||||||||||||||||
Its SWE bench pro not swe bench verified. The verified benchmark has stagnated | |||||||||||||||||
| |||||||||||||||||
| ▲ | Rudybega an hour ago | parent | prev [-] | ||||||||||||||||
You're comparing two different benchmarks. Pro vs Verified. | |||||||||||||||||