I tried building and using llama.cpp multiple times, and after a while, I got so frustrated with the frequently broken build process that I switched to ollama with the following script:

  #!/bin/sh
  export OLLAMA_MODELS="/mnt/ai-models/ollama/"
  
  printf 'Starting the server now.\n'
  ollama serve >/dev/null 2>&1 &
  serverPid="$!"
  
  printf 'Starting the client (might take a moment (~3min) after a fresh boot).\n'
  ollama run llama3.2 2>/dev/null

  printf 'Stopping the server now.\n'
  kill "$serverPid"

And it just works :-)

▲ boneitis 7 months ago | parent [-]

this was pretty much spot-on to my experience and track. the ridicule of people choosing to use ollama over llamacpp is so tired.

i had already burned an evening trying to debug and fix issues getting nowhere fast, until i pulled ollama and had it working with just two commands. it was a shock. (granted, there is/was a crippling performance problem with sky/kabylake chips but mitigated if you had any kind of mid-tier GPU and tweaked a couple settings)

anyone who tries to contribute to the general knowledge base of deploying llamacpp (like TFA) is doing heaven's work.

	▲	SteelPh0enix 7 months ago \| parent [-]
		I have spent unreasonable amounts of time building llama.cpp for my hardware setup (AMD GPU) on both Windows and Linux. That was one of the main reasons of writing that blog post for me. Lmao.