Remix.run Logo
smcleod 5 hours ago

Neat to see more folks writing blogs on their experiences. This however does seem like it's an over-complicated method of building llama.cpp.

Assuming you want to do this iteratively (at least for the first time) should only need to run:

  ccmake .
And toggle the parameters your hardware supports or that you want (e.g. if CUDA if you're using Nvidia, Metal if you're using Apple etc..), and press 'c' (configure) then 'g' (generate), then:

  cmake --build . -j $(expr $(nproc) / 2)

Done.

If you want to move the binaries into your PATH, you could then optionally run cmake install.

lukev 21 minutes ago | parent | next [-]

Actually I think even this makes it look scarier than it is if you're on an M-series Apple.

In that case, the steps to building llama.cpp are:

1. Clone the repo.

2. Run `make`.

To start chatting with a model all you need is to:

1. Download the model you want in gguf format that will fit into your hardware (probably the hardest step, but readily available on HuggingFace)

2. Run `./llama-server -m model.gguf`.

3. Visit localhost:8080

SteelPh0enix 3 hours ago | parent | prev | next [-]

Wow, i did not know about ccmake. I'll check it out and edit the post if it's really that easy to use, thanks.

moffkalast 2 hours ago | parent | prev [-]

Yeah the mingw method on windows is a ludicrous thing to even think about, and llama.cpp still has that as the suggested option in the readme for some weird reason. Endless sourcing of paths that never works quite right. I literally couldn't get it to work when I first tried it last year.

Meanwhile Cmake is like two lines and somehow it's the backup fallback option? I don't get it. And well on linux it's literally just one line with make.

blharr 10 minutes ago | parent [-]

Building anything on windows without cmake is just... I don't know why anyone would use anything else. I used to spend hours wrestling with build failures, but after setting up cmake, it just works with everything.