Remix.run Logo
kennywinker 2 hours ago

I’ve read all the stuff about how llama.cpp is much faster and better than ollama, and i believe it - but good god llama.cpp isn’t user friendly.

You’d think in an era where “code is free” there would be an easier story around running local ai than compiling llama.cpp by hand and then spending hours researching flags - only for it to crash from an oom error every ten prompts or so.

greenavocado 2 hours ago | parent [-]

You're supposed to use a cheap ChatGPT subscription to run optimization loops over llama.cpp flags with a self-contained reproducible benchmark script and just let it burn for hours/days until it is fully optimized ))))