| ▲ | speedgoose 5 hours ago | |||||||
Does it have a model registry with an API and hot swapping or you still have to use sometime like llama swap as suggested in the article ? Or is it CLI? | ||||||||
| ▲ | dminik 5 hours ago | parent [-] | |||||||
You can have multiple models served now with loading/unloading with just the server binary. https://github.com/ggml-org/llama.cpp/blob/master/tools/serv... | ||||||||
| ||||||||