▲ | steren 7 days ago | |||||||
> I would never want to use something like ollama in a production setting. We benchmarked vLLM and Ollama on both startup time and tokens per seconds. Ollama comes at the top. We hope to be able to publish these results soon. | ||||||||
▲ | ekianjo 7 days ago | parent | next [-] | |||||||
you need to benchmark against llama.cpp as well. | ||||||||
▲ | apitman 7 days ago | parent | prev | next [-] | |||||||
Did you test multi-user cases? | ||||||||
| ||||||||
▲ | sbinnee 6 days ago | parent | prev [-] | |||||||
vllm and ollama assume different settings and hardware. Vllm backed by the paged attention expect a lot of requests from multiple users whereas ollama is usually for single user on a local machine. |