| ▲ | andai 2 hours ago | |||||||
Could you elaborate on what you did to get it working? I built it from source, but couldn't get it (the 4B model) to produce coherent English. Sample output below (the model's response to "hi" in the forked llama-cli): X ( Altern as the from (.. Each. ( the or,./, and, can the Altern for few the as ( (. . ( the You theb,’s, Switch, You entire as other, You can the similar is the, can the You other on, and. Altern. . That, on, and similar, and, similar,, and, or in | ||||||||
| ▲ | freakynit 43 minutes ago | parent | next [-] | |||||||
I have older M1 air with 8GB, but still getting ober 23 t/s on 4B model.. and the quality of outputs is on par with top models of similar size. 1. Clone their forked repo: `git clone https://github.com/PrismML-Eng/llama.cpp.git` 2. Then (assuming you already have xcode build tools installed):
3. Finally, run it with (you can adjust arguments):
Model was first downloaded from: https://huggingface.co/prism-ml/Bonsai-8B-gguf/tree/main | ||||||||
| ||||||||
| ▲ | jjcm an hour ago | parent | prev [-] | |||||||
I did this: https://image.non.io/2093de83-97f6-43e1-a95e-3667b6d89b3f.we... Literally just downloaded the model into a folder, opened cursor in that folder, and told it to get it running. Prompt: The gguf for bonsai 8b are in this local project. Get it up and running so I can chat with it. I don't care through what interface. Just get things going quickly. Run it locally - I have plenty of vram. https://huggingface.co/prism-ml/Bonsai-8B-gguf/tree/main I had to ask it to increase the context window size to 64k, but other than that it got it running just fine. After that I just told ngrok the port I was serving it on and voila. | ||||||||