Remix.run Logo
mft_ 4 days ago

I think we're still at least an order of magnitude away (in terms of affordable local inference, or model improvements to squeeze more from less, or a combination of the two) from local solutions being seriously competitive for general purpose tasks, sadly.

I recently bought a second-hand 64GB Mac to experiment with. Even with the biggest recent local model it can run (llama3.3:70b just about runs acceptably; I've also tried an array of Qwen3 30b variants) the quality is lacking for coding support. They can sometimes write and iterate on a simple Python script, but sometimes fail, and for general-purpose models, often fail to answer questions accurately (not unsurprisingly, considering the model is a compression of knowledge, and these are comparatively small models). They are far, far away from the quality and ability of currently available Claude/Gemini/ChatGPT models. And even with a good eBay deal, the Mac cost the current equivalent of ~6 years of a monthly subscription to one of these.

Based on the current state of play, once we can access relatively affordable systems with 512-1024GB fast (v)ram and sufficient FLOPs to match, we might have a meaningfully powerful local solution. Until then, I fear local only is for enthusiasts/hobbyists and niche non-general tasks.

hadlock a day ago | parent [-]

It would not surprise me at all to see 512, 768, 1024 gb models targeted at commercial or home users in the next 5 years. I can imagine a lot of companies, regulated ones in particular like finance, defense, medical, wanting to run the models in house, inside their own datacenter. A single card or pair of cards would probably be more than adequate for a thousand or more users, or half a dozen developers. If you already have a $25,000 database server, $12,000 for an "ai server" isn't a wild ask.