Remix clone Hacker News

new | show | ask | jobs Github

	▲	throwawaymaths a year ago
		Yeah but I think of you've got a GPU you should probably think about using vllm. Last I tried using llama.cpp (which granted was several months ago) the ux was atrocious -- vllm basically gives you an openai api with no fuss. That's saying something as generally speaking I loathe Python.