Not to sound like an LLM, but that seems exactly right to me. Use it as a cheaper, high-functioning task subagent and lower reasoning for a master Opus session. As long as not every portion of your task requires maximum intelligence, you should come out ahead.

▲

user43928 9 hours ago | parent [-]

Won't any input be charged uncached, and the output of the small model charged again as uncached input to the bigger model?

I don't know whether that comes out ahead compared to just staying with the better model in the first place.

▲

mwigdahl 8 hours ago | parent [-]

It's a good question, but for multiturn conversations even cached context adds up quickly. My experience has been that spawning off subagents for defined tasks in a large overall plan generally makes me come out ahead.

I'm sure folks' mileage will vary though.

	▲	noisy_boy 3 hours ago \| parent [-]
		I asked this question and was told that even if it is counter intuitive, medium will be more cost efficient due to caching. Changed to medium, blew my budget and went back to low.