> I still haven't experienced a local model that fits on my 64GB MacBook Pro and can run a coding agent like Codex CLI or Claude code well enough to be useful

I've had mild success with GPT-OSS-120b (MXFP4, ends up taking ~66GB of VRAM for me with llama.cpp) and Codex.

I'm wondering if maybe one could crowdsource chat logs for GPT-OSS-120b running with Codex, then seed another post-training run to fine-tune the 20b variant with the good runs from 120b, if that'd make a big difference. Both models with the reasoning_effort set to high are actually quite good compared to other downloadable models, although the 120b is just about out of reach for 64GB so getting the 20b better for specific use cases seems like it'd be useful.

▲

andai 4 hours ago | parent | next [-]

Are you running 120B agentic? I tried using it in a few different setups and it failed hard in every one. It would just give up after a second or two every time.

I wonder if it has to do with the message format, since it should be able to do tool use afaict.

▲

pocksuppet 2 hours ago | parent | prev | next [-]

You are describing distillation, there are better ways to do it, and it was done in the past, Deepseek distilled onto Qwen.

▲

gigatexal 7 hours ago | parent | prev [-]

I’ve a 128GB m3 max MacBook Pro. Running the gpt oss model on it via lmstudio once the context gets large enough the fans spin to 100 and it’s unbearable.

▲

pixelpoet 6 hours ago | parent | next [-]

Laptops are fundamentally a poor form factor for high performance computing.

▲

embedding-shape 6 hours ago | parent | prev [-]

Yeah, Apple hardware don't seem ideal for LLMs that are large, give it a go with a dedicated GPU if you're inclined and you'll see a big difference :)

	▲	politelemon 3 hours ago \| parent [-]
		What are some good GPUs to look for if you're getting started?