KV quantization has long been available in llama.cpp
Yes but the optimisation described has not right?