Remix.run Logo
cgdl 6 days ago

Very cool. For the INT4 QAT model, what is the recommended precision for the activations and for the key and values stored in KV cache?

hnuser123456 6 days ago | parent [-]

For keys, you probably want to use at least q5 or q6, for values q4 is fine