Remix.run Logo
yjftsjthsd-h 6 hours ago

You might also try https://github.com/Mozilla-Ocho/llamafile , which may have better CPU-only performance than ollama. It does require you to grab .gguf files yourself (unless you use one of their prebuilts in which case it comes with the binary!), but with that done it's really easy to use and has decent performance.

For reference, this is how I run it:

  $ cat ~/.config/systemd/user/llamafile@.service
  [Unit]
  Description=llamafile with arbitrary model
  After=network.target
  
  [Service]
  Type=simple
  WorkingDirectory=%h/llms/
  ExecStart=sh -c "%h/.local/bin/llamafile -m %h/llamafile-models/%i.gguf --server --host '::' --port 8081 --nobrowser --log-disable"
  
  [Install]
  WantedBy=default.target
And then

  systemctl --user start llamafile@whatevermodel
but you can just run that ExecStart command directly and it works.
chatmasta 4 hours ago | parent | next [-]

Be careful running this on work machines – it will get flagged by Crowdstrike Falcon and probably other EDR tools. In my case the first time I tried it, I just saw “Killed” and then got a DM from SecOps within two minutes.

SahAssar 5 hours ago | parent | prev [-]

Is that `--host` listening on non-local addresses? Might be good to default to local-only.