| ▲ | kennethops 10 hours ago | |
do you know if they did this to it? https://research.google/blog/turboquant-redefining-ai-effici... | ||
| ▲ | kgeist 10 hours ago | parent [-] | |
Llama.cpp already uses an idea from it internally for the KV cache [0] So a quantized KV cache now must see less degradation | ||