| ▲ | taspeotis 5 hours ago | ||||||||||||||||||||||
The harness is super important, what tools are available and the system prompts vary from harness to harness. Anthropic seems to have a modest lead on their harness and models, so it’s a best-of-both-worlds scenario. > I'm not sure what Microsoft is doing behind the scenes It’s probably the exact same model, but the tools and the prompts around it are worse, so you get worse results. | |||||||||||||||||||||||
| ▲ | irthomasthomas 2 hours ago | parent | next [-] | ||||||||||||||||||||||
Claude in Claude code has been shown to perform persistently worse in evals than claude + a minimal harness. | |||||||||||||||||||||||
| ▲ | kilburn 4 hours ago | parent | prev | next [-] | ||||||||||||||||||||||
The harness was absolutely not an issue in my case. The new pricing model where I got banned from using Opus entirely and half a day of work (with weaker models) consumed the 10$ plan was. I'm now using a Claude Max subscription and I can get close to the daily limits but I'm fairly happy with the overall plan consumption. | |||||||||||||||||||||||
| ▲ | Vinnl 5 hours ago | parent | prev [-] | ||||||||||||||||||||||
So if you use Claude via Copilot in Zed... You use Zed's harness, I think? What does Copilot do, at that point? | |||||||||||||||||||||||
| |||||||||||||||||||||||