Remix clone Hacker News

new | show | ask | jobs Github

	▲	dminik 5 hours ago
		You can have multiple models served now with loading/unloading with just the server binary. https://github.com/ggml-org/llama.cpp/blob/master/tools/serv...
	▲	speedgoose 4 hours ago \| parent [-]
		It only lacks the automatic FIFO loading/unloading then. Maybe it will be there in a few weeks.