Remix.run Logo
piperswe 10 months ago

How much of that is because the models are optimizing specifically for SWE bench?

icpmacdo 10 months ago | parent [-]

not that much because its getting better at all benchmarks

10 months ago | parent [-]
[deleted]