I switched from llama.cpp to vLLM because of prompt cache bugs in qwen/gemma models
This is a good starting issue with a bunch of linked/related
https://github.com/ggml-org/llama.cpp/issues/22746