| ▲ | vibe42 16 hours ago | |
With 16 GB VRAM one can run a decent quant (Q4-Q8) of newer, smaller dense models. This leaves room for e.g. 32-256k context size. This might not be enough to chew through a large code base but for smaller projects it can easily fit enough if not all of the code base to drive a good coding agent. I don't recommend specific models or model providers due to how much hype and BS there is around benchmarks etc. Easiest is to check the latest generation of open models and look for a dense-type where a decent quant fits within the VRAM. Some models run fast enough that some of the weights can spill over from VRAM to RAM while maintaining a usable prompt/token gen speed. | ||
| ▲ | solstice 13 hours ago | parent [-] | |
Thank you! | ||