| ▲ | kristjansson 10 days ago | |
> quantization 12b means 12G @ 8 bits/param (basically lossless) and 6G at 4 b/p (generally accepted 'pretty close' level). Not too bad? But TBD how well the base model performs before thinking too much about quantization | ||
| ▲ | magicalhippo 9 days ago | parent [-] | |
Smaller models are less forgiving to quantization. For a 12B model I wouldn't expect Q4 to be "pretty close", unless it underwent quantization aware training (QAT). Of course it's not set in stone, there's a huge variance between models, so this might surprise. | ||