| ▲ | freakynit 2 days ago | |
I have older M1 air with 8GB, but still getting ober 23 t/s on 4B model.. and the quality of outputs is on par with top models of similar size. 1. Clone their forked repo: `git clone https://github.com/PrismML-Eng/llama.cpp.git` 2. Then (assuming you already have xcode build tools installed):
3. Finally, run it with (you can adjust arguments):
Model was first downloaded from: https://huggingface.co/prism-ml/Bonsai-8B-gguf/tree/main | ||
| ▲ | freakynit 2 days ago | parent [-] | |
To the author: why is this taking 4.56GB ? I was expecting this to be under 1GB for 4B model. https://ibb.co/CprTGZ1c And this is when Im serving zero prompts.. just loaded the model (using llama-server). | ||