▲ | 827a 3 days ago | |
Exactly yeah, my point is that there's a lot more to running these models than just the raw memory bandwidth and GPU-available memory size, and the difference between a $6000 M4 Ultra Mac Studio and a $2000 AI Max 395+ isn't actually as big as the raw numbers would suggest. On the flip-side, though: Running GPT-OSS-120b locally is "cool", but have people found useful, productivity enhancing use-cases which justify doing this over just loading $2000 into your OpenAI API account? That, I'm less sure of. I think we'll get to the point where running a local-first AI stack is obviously an awesome choice; I just don't think the hardware or models are there yet. Next-year's Medusa Halo, combined with another year of open source model improvements might be the inflection point. | ||
▲ | vid 3 days ago | parent [-] | |
I use local AI fairly often for innocuous queries (health, history, etc) I don't want to feed the spy machines plus I like the hands on aspect, I would use it more if I had more time and while I hear the 120b is pretty good (I mostly use qwen 30b), I would use it a lot more if I could run some of the really great models. Hopefully Medusa Halo will be all that. |