| ▲ | woadwarrior01 3 hours ago | |||||||
There's this[1]. Model providers have a strong incentive to switch (a part of) their inference fleet to quantized models during peak loads. From a systems perspective, it's just another lever. Better to have slightly nerfed models than complete downtime. | ||||||||
| ▲ | nl 3 hours ago | parent [-] | |||||||
So - as the charts say - no statistical difference? Isn't this link am argument against the point you are making? | ||||||||
| ||||||||