| ▲ | heipei 4 hours ago | |
Same here, I use Qwen 3.6 27b (Q6 quant) with llama.cpp on an RTX 5090 using the pi agent exclusively now. The fact that it's local means that I never have to think about token pricing, quotas, time of day, or data sensitivity. I have limited the GPU from 600W to 450W which means the system stays whisper quiet during inference. I have become so "lazy" (in a good way), so far that I've started using the model for lots of daily mundane things on top of just coding: | ||
| ▲ | amarshall 42 minutes ago | parent [-] | |
What context length and kv cache quant (if any) are you using? And MTP? | ||