| ▲ | handcrafted 11 hours ago | |
GPT-5.5 and Opus 4.7 are comparable when using the same harness mini-swe-agent. GPT-5.5 demonstrates a significant performance delta only when integrated with the Codex module. We hypothesize that the superior performance of Opus 4.7 on mini-swe-agent relative to the more complex Claude Code harness stems from the tight feedback loop (edit-run-check), well suited for the CAD generation task. | ||
| ▲ | bigskydog 4 hours ago | parent [-] | |
There are also a benchmark called BenchCAD that came out recently, which shows similiar results, Opus 4.7 seems to be the best. https://benchcad.github.io/BenchCAD_webpage/ | ||