Remix clone Hacker News

new | show | ask | jobs Github

▲

pheggs 5 hours ago

you can pull directly from huggingface with llama.cpp, and it also has a decent web chat included

▲

speedgoose 5 hours ago | parent [-]

Does it have a model registry with an API and hot swapping or you still have to use sometime like llama swap as suggested in the article ? Or is it CLI?

▲

dminik 5 hours ago | parent [-]

You can have multiple models served now with loading/unloading with just the server binary.

https://github.com/ggml-org/llama.cpp/blob/master/tools/serv...

	▲	speedgoose 4 hours ago \| parent [-]
		It only lacks the automatic FIFO loading/unloading then. Maybe it will be there in a few weeks.