| ▲ | Aurornis 2 hours ago | |
Good point, but you still need KV cache and more. Fitting the model alone to RAM doesn’t get the job done. | ||
| ▲ | segmondy 2 hours ago | parent [-] | |
Yeah, it doesn't take much. I'm looking at it right now, KV cache is about 4gb of vram, compute buffer =~ 1.5gb at full 128k context. | ||