Remix.run Logo
SWE-bench will hit 90% this year(fabraix.com)
6 points by asfsf23423 14 hours ago | 5 comments
upmind 12 hours ago | parent [-]

Maybe unpopular opinion but I think at this point SWE-Bench has done its part and we need a new benchmark because Gemini being on/near the same level as Claude is obviously wrong

3 hours ago | parent | next [-]
[deleted]
amazingamazing 11 hours ago | parent | prev | next [-]

I use both and think they’re comparable. AMA.

zachdotai 3 hours ago | parent [-]

Not sure which version of Gemini are you using but Claude is so much better for me. Gemini is generally overeager to make a code change even when I am just asking conceptual questions, among other issues.

lern_too_spel 11 hours ago | parent | prev [-]

Gemini at the same level as Claude is believable. Gemini CLI is not at the same level as Claude Code.