Remix.run Logo
usagisushi 16 hours ago

Not the OP, but their setup must be faster than my 4060 16GB + 3060 12GB setup. Here are my numbers (typical values, N=1):

    Model                         pp (t/s)    tg (t/s)
    Qwen 3.6 27B            900           29
    Qwen 3.6 35B-A3B   2100          85
    Gemma 4 31B            750           28
    Gemma 4 26B-A4B   2500         90
- All models: UD-Q4 w/ MTP. Context size: ~100k (MoE) / ~70k (Dense).

- Layer splitting used. Tensor splitting is ~1.2x faster in TG, but power spikes from 150W to 380W.