| ▲ | thot_experiment an hour ago | ||||||||||||||||
I just build llama.cpp from scratch on the PR that has MTP drafters. https://github.com/ggml-org/llama.cpp/pull/23398 Please don't use Ollama, it's a bad actor in the OSS community. | |||||||||||||||||
| ▲ | dofm an hour ago | parent [-] | ||||||||||||||||
I don't have the energy to build stuff all the time, that's a rabbit-hole side tunnel I don't really want to get into. I have larger concerns in my life that are more urgent than developing that side of things. But I've moved on from Ollama for the time being, though I am mainly interested to see what the Gemma 4 MTP speeds are like on my M1 Max, so I may test it. I am quite impressed with the tools in LM Studio, which is also a beautiful app, but it is not open source (which challenges my personal strategy somewhat) and I dread its inevitable enshittification. Nevertheless the GUI has been very helpful while I learn, and I will probably use it until something else presents or my usage pattern settles down from experimentation to something a bit more routine. I will try oMLX, too, but judging by the LiteRT page I may soon be able to just use that for the larger models if I end up settling with Gemma 4. | |||||||||||||||||
| |||||||||||||||||