Yeah, no way I'd do this if I paid per token. Next experiment will probably be local-only together with GPT-OSS-120b which according to my own benchmarks seems to still be the strongest local model I can run myself. It'll be even cheaper then (as long as we don't count the money it took to acquire the hardware).

▲

mercutio2 6 hours ago | parent [-]

What toolchain are you going to use with the local model? I agree that’s a Strong model, but it’s so slow for be with large contexts I’ve stopped using it for coding.

	▲	embedding-shape 3 minutes ago \| parent [-]
		I have my own agent harness, and the inference backend is vLLM.