| ▲ | satvikpendem 5 hours ago | |
Just use llama.cpp or Unsloth Studio which wraps it, I don't know why anyone use Ollama anymore. | ||
| ▲ | verdverm 3 hours ago | parent [-] | |
I switched from llama.cpp to vLLM because of prompt cache bugs in qwen/gemma models This is a good starting issue with a bunch of linked/related | ||