| ▲ | nikodunk 5 hours ago | ||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
Having read above article, I just gave llama.cpp a shot. It is as easy as the author says now, though definitely not documented quite as well. My quickstart: brew install llama.cpp llama-server -hf ggml-org/gemma-4-E4B-it-GGUF --port 8000 Go to localhost:8000 for the Web UI. On Linux it accelerates correctly on my AMD GPU, which Ollama failed to do, though of course everyone's mileage seems to vary on this. | |||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
| ▲ | teekert 4 hours ago | parent [-] | ||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
Was hoping it was so easy :) But I probably need to look into it some more. llama_model_load: error loading model: error loading model architecture: unknown model architecture: 'gemma4' llama_model_load_from_file_impl: failed to load model Edit: @below, I used `nix-shell -p llama-cpp` so not brew related. Could indeed be an older version indeed! I'll check. | |||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
| |||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||