▲ | simonw 4 days ago | |||||||||||||||||||||||||
Right now you can run some of the best available open weight models on a 512GB Mac Studio, which retails for around $10,000. Here's Qwen3-Coder-480B-A35B-Instruct running at 24 tokens/second at 4bit: https://twitter.com/awnihannun/status/1947771502058672219 and Deep Seek V3 0324 in 4-bit at 20 toks/sec https://twitter.com/awnihannun/status/1904177084609827054 You can also string two 512GB Mac Studios together using MLX to load even larger models - here's 671B 8-bit DeepSeek R1 doing that: https://twitter.com/alexocheema/status/1899735281781411907 | ||||||||||||||||||||||||||
▲ | zargon 4 days ago | parent [-] | |||||||||||||||||||||||||
What these tweets about Apple silicon never show you: waiting 20+ minutes for it to ingest 32k context tokens. (Probably a lot longer for these big models.) | ||||||||||||||||||||||||||
|