I just build llama.cpp from scratch on the PR that has MTP drafters.

https://github.com/ggml-org/llama.cpp/pull/23398

Please don't use Ollama, it's a bad actor in the OSS community.

I don't have the energy to build stuff all the time, that's a rabbit-hole side tunnel I don't really want to get into. I have larger concerns in my life that are more urgent than developing that side of things.

But I've moved on from Ollama for the time being, though I am mainly interested to see what the Gemma 4 MTP speeds are like on my M1 Max, so I may test it.

I am quite impressed with the tools in LM Studio, which is also a beautiful app, but it is not open source (which challenges my personal strategy somewhat) and I dread its inevitable enshittification.

Nevertheless the GUI has been very helpful while I learn, and I will probably use it until something else presents or my usage pattern settles down from experimentation to something a bit more routine.

I will try oMLX, too, but judging by the LiteRT page I may soon be able to just use that for the larger models if I end up settling with Gemma 4.

▲

thot_experiment 38 minutes ago | parent [-]

Totally understandable. YMMV but I found the llama.cpp build process to work on the first try on my machine, and it only takes a couple minutes, which definitely isn't my usual expectation or experience. I was very pleasantly surprised. Their web-ui is also getting very polished while still doing a great job of letting you tweak all the weird settings.

	▲	dofm 15 minutes ago \| parent [-]
		Sorry, I sounded a bit terse there! You have probably convinced me to give it a try, to be honest. It's just that, to cut a long story short, I am currently recovering from a level of burnout so severe that twelve months ago had me fully convinced I was actually in early-onset cognitive decline (I am a bit over fifty). Only a little over two months ago I was still sure I'd have to quit IT and find a slow job because I was so out of the loop; this whole industry shift even in just the last few months is so shocking and strange. So I have to be a bit cautious about how many indirections I add, if that makes sense. But I am compiling bigger projects than llama.cpp so I will give it a go. Thank you for the extra detail.