I run models with Claude Code (Using the Anthropic API feature of llama.cpp) on my own hardware and it works every bit as well as Claude worked literally 12 months ago.

If you don't believe me and don't want to mess around with used server hardware you can walk into an Apple Store today, pick up a Mac Studio and do it yourself.

▲

Eggpants a day ago | parent | next [-]

I’ve been doing the same with GPT-OSS-120B and have been impressed.

Only gotcha is Claude code expects a 200k context window while that model max supports 130k or so. I have to do a /compress when it gets close. I’ll have to see if there is a way to set the max context window in CC.

Been pretty happy with the results so far as long as I keep the tasks small and self contained.

	▲	petesergeant a day ago \| parent [-]
		I've been making use of gpt-oss-120b extensively for a range of projects, commercial and price, because providers on OpenRouter make it essentially free and instant, and it's roughly as capable at o4-mini was in my experience. That said, I'm a little surprised to hear you're having great success with it as a coding agent. It's "obviously" worse than the frontier models, and even they can making blindly dumb decisions pretty regularly. Maybe I should give it a shot.

▲

icedchai a day ago | parent | prev [-]

Whats your preferred local model?