Remix clone Hacker News

new | show | ask | jobs Github

	▲	llmtosser 7 days ago
		This is not true. No inference engine does all of: - Model switching - Unload after idle - Dynamic layer offload to CPU to avoid OOM
	▲	ekianjo 7 days ago \| parent [-]
		this can be added to llama.cpp with llama.swap currently so even without Ollama you are not far off