Is this something that will show up in Ollama any time soon to increase context size of local models?
KV quantization has long been available in llama.cpp
Yes but the optimisation described has not right?