Interesting; do you find they actually react the same way to the harness?

There are differences for sure. Claude models feel the most 'stable' in that I see less tool confusion messages and other mistakes like the one im looking at right now.

"Wait, I'm editing the wrong sections. The edit tool tried to match but replaced with different prop names than what was in the file. Let me re-read the file and understand the current state properly."

And of course models are not 1-to-1 and have different strengths and weaknesses. I know I wont get the same quality plan mode output probably. Its a tradeoff.

	▲	cyanydeez 6 hours ago \| parent [-]
		I generally assume the differences could be minimized by tailoring the instructions to the models; that they're not incapable of doing the same things, but the way in which they're instructed matters because it needs to draw on training. But I don't use any of the cloud stuff; I'm local4lyfe.