▲ | cgdl 6 days ago | |
Very cool. For the INT4 QAT model, what is the recommended precision for the activations and for the key and values stored in KV cache? | ||
▲ | hnuser123456 6 days ago | parent [-] | |
For keys, you probably want to use at least q5 or q6, for values q4 is fine |