| ▲ | usagisushi 16 hours ago | |
Not the OP, but their setup must be faster than my 4060 16GB + 3060 12GB setup. Here are my numbers (typical values, N=1):
- All models: UD-Q4 w/ MTP. Context size: ~100k (MoE) / ~70k (Dense).- Layer splitting used. Tensor splitting is ~1.2x faster in TG, but power spikes from 150W to 380W. | ||