| ▲ | zozbot234 4 hours ago | |||||||
Today's free models are not really bigger when you account for the use of MoE (with ever increasing sparsity, meaning a smaller fraction of active parameters), and better ways of managing KV caching. You can do useful things with very little RAM/VRAM, it just gets slower and slower the more you try to squeeze it where it doesn't quite belong. But that's not a problem if you're willing to wait for every answer. | ||||||||
| ▲ | r_lee 25 minutes ago | parent [-] | |||||||
yeah, but I mean more like the old setups where you'd just load a model on a 4090 or something, even with MoE it's a lot more complex and takes more VRAM, right? like it just seems not justifiable for most hobbyists but maybe I'm just slightly out of the loop | ||||||||
| ||||||||