Remix.run Logo
r_lee 4 hours ago

I think that's because there's less local AI usage now since there's all kinds of image models by the big labs, so there's really no rush of people self hosting stable diffusion etc anymore

the space moved from Consumer to Enterprise pretty fast due to models getting bigger

zozbot234 4 hours ago | parent [-]

Today's free models are not really bigger when you account for the use of MoE (with ever increasing sparsity, meaning a smaller fraction of active parameters), and better ways of managing KV caching. You can do useful things with very little RAM/VRAM, it just gets slower and slower the more you try to squeeze it where it doesn't quite belong. But that's not a problem if you're willing to wait for every answer.

r_lee 23 minutes ago | parent [-]

yeah, but I mean more like the old setups where you'd just load a model on a 4090 or something, even with MoE it's a lot more complex and takes more VRAM, right? like it just seems not justifiable for most hobbyists

but maybe I'm just slightly out of the loop

zozbot234 10 minutes ago | parent [-]

With sparse MoE it's worth running the experts in system RAM since that allows you to transparently use mmap and inactive experts can stay on disk. Of course that's also a slowdown unless you have enough RAM for the full set, but it lets you run much larger models on smaller systems.