| ▲ | simonw 5 hours ago | |||||||
I got this running locally using llama.cpp from Homebrew and the Unsloth quantized model like this:
Then:
That opened a CLI interface. For a web UI on port 8080 along with an OpenAI chat completions compatible endpoint do this:
It's using about 28GB of RAM. | ||||||||
| ▲ | nubg 3 hours ago | parent | next [-] | |||||||
what's the token per seconds speed? | ||||||||
| ▲ | technotony 3 hours ago | parent | prev [-] | |||||||
what are your impressions? | ||||||||
| ||||||||