Remix clone Hacker News

new | show | ask | jobs Github

	▲	satvikpendem 5 hours ago
		Just use llama.cpp or Unsloth Studio which wraps it, I don't know why anyone use Ollama anymore.
	▲	verdverm 3 hours ago \| parent [-]
		I switched from llama.cpp to vLLM because of prompt cache bugs in qwen/gemma models This is a good starting issue with a bunch of linked/related https://github.com/ggml-org/llama.cpp/issues/22746