Remix clone Hacker News

new | show | ask | jobs Github

	▲	heipei 4 hours ago
		Same here, I use Qwen 3.6 27b (Q6 quant) with llama.cpp on an RTX 5090 using the pi agent exclusively now. The fact that it's local means that I never have to think about token pricing, quotas, time of day, or data sensitivity. I have limited the GPU from 600W to 450W which means the system stays whisper quiet during inference. I have become so "lazy" (in a good way), so far that I've started using the model for lots of daily mundane things on top of just coding: `* "commit this on a branch, push, create a PR and assign $nickname for review" * "Use the Stripe CLI to download all open and overdue invoices and reconcile them with this CSV export from our bank account." * "Use these Elasticsearch credentials to summarise what kind of operations are causing load at the moment." * "Tell me if our codebase already supports X and where it's implemented."`
	▲	amarshall 42 minutes ago \| parent [-]
		What context length and kv cache quant (if any) are you using? And MTP?