Right now you can run some of the best available open weight models on a 512GB Mac Studio, which retails for around $10,000. Here's Qwen3-Coder-480B-A35B-Instruct running at 24 tokens/second at 4bit: https://twitter.com/awnihannun/status/1947771502058672219 and Deep Seek V3 0324 in 4-bit at 20 toks/sec https://twitter.com/awnihannun/status/1904177084609827054

You can also string two 512GB Mac Studios together using MLX to load even larger models - here's 671B 8-bit DeepSeek R1 doing that: https://twitter.com/alexocheema/status/1899735281781411907

▲

zargon 4 days ago | parent [-]

What these tweets about Apple silicon never show you: waiting 20+ minutes for it to ingest 32k context tokens. (Probably a lot longer for these big models.)

▲

logicprog 4 days ago | parent [-]

Yeah, I bought a used Mac Studio (an M1, to be fair, but still a Max and things haven't changed since) hoping to be able to run a decent LLM on it, and was sorely disappointed thanks to the prompt processing speed especially.

▲

alt227 3 days ago | parent [-]

No offense to you personally, but I find it very funny when people hear marketing copy for a product and think it can do anything they said it can.

Apple silicon is still just a single consumer grade chip. It might be able to run certain end user software well, but it cannot replace a server rack of GPUs.

	▲	zargon 3 days ago \| parent [-]
		I don’t think this is a fair take in this particular situation. My comment is in response to Simon Willison, who has a very popular blog in the LLM space. This isn’t company marketing copy; it’s trusted third parties spreading this misleading information.