| ▲ | em-bee 4 hours ago | |||||||
yes, but a model that runs on my own machine will never have the capacity of a model that runs in a datacenter. as i said, it can't compete with that. | ||||||||
| ▲ | randbyte 2 hours ago | parent | next [-] | |||||||
> a model that runs on my own machine will never have the capacity of a model that runs in a datacenter. I don’t think so. A local run model only needs to serve one or a few people. It seems possible to run a DeepSeek v4 model at full capacity on a server costing 200k usd. Very expensive but not impossible. Factor in hardware and software improvements over time, and the fact that most people may just need to run a smaller and quantized model, it should take a pc at 10k usd scale. | ||||||||
| ▲ | thewebguyd 4 hours ago | parent | prev [-] | |||||||
If RAM prices ever come down, you can have a machine that can run a capable local model. Qwen 2.5 72B is surprisingly capable, almost on par with GPT-4o if not a little better. You can run it on a 128GB Mac Studio with 8-bit quantization. You need about 77GB for the weights and ~15GB for your context window & cache. Pricing remains to be seen, but there's also those new nvidia laptops coming out the surface laptop ultra should have 128GB RAM w/ Blackwell GPU, they're saying 1 petaflop of AI compute, if you can tolerate Windows (no idea if it'll boot Linux until the hardware is out). These models are roughly ~1 year or less behind the frontier models. We really just need hardware to catch up and alleviate the price pressure on RAM. | ||||||||
| ||||||||