| ▲ | jasonjmcghee 3 hours ago | |
> if you want to understand the effects of quantization on model quality, it's really easy to spin up a GPU server instance and play around Fwiw, not necessarily. I've noticed quantized models have strange and surprising failure modes where everything seems to be working well and then does a death spiral repeating a specific word or completely failing on one task of a handful of similar tasks. 8-bit vs 4-bit can be almost imperceptible or night and day. This isn't something you'd necessarily see playing around, but when trying to do something specific | ||