| ▲ | GeekyBear 5 days ago | |
A discrete consumer GPU card doesn't have enough fast RAM to run a very large model that hasn't been quanitized to hell. That's why all the projects streaming models into the GPU from an SSD popped up recently. | ||
| ▲ | manmal 4 days ago | parent [-] | |
Yes. There’s just no way to get above 1t/s that way with a large model. | ||