▲ | llmtosser 7 days ago | |
This is not true. No inference engine does all of: - Model switching - Unload after idle - Dynamic layer offload to CPU to avoid OOM | ||
▲ | ekianjo 7 days ago | parent [-] | |
this can be added to llama.cpp with llama.swap currently so even without Ollama you are not far off |