| ▲ | Aeroi 4 hours ago | |
harness definitely makes a difference for the benchmarks. I ran my agent Camera Search against a few benchmarks and was able to beat Opus 4.7. I created a real world benchmark, for mining, oil&gas, construction ect. called FieldOps-bench and it basically proves that vertical agents and specialized harness, tool, systems outperforms SOTA models alone still. | ||