Remix clone Hacker News

new | show | ask | jobs Github

	▲	usagisushi 16 hours ago
		Not the OP, but their setup must be faster than my 4060 16GB + 3060 12GB setup. Here are my numbers (typical values, N=1): `Model pp (t/s) tg (t/s) Qwen 3.6 27B 900 29 Qwen 3.6 35B-A3B 2100 85 Gemma 4 31B 750 28 Gemma 4 26B-A4B 2500 90` - All models: UD-Q4 w/ MTP. Context size: ~100k (MoE) / ~70k (Dense). - Layer splitting used. Tensor splitting is ~1.2x faster in TG, but power spikes from 150W to 380W.