| ▲ | Schekin a day ago | |
This matches my experience. The weights usually arrive before the runtime stack fully catches up. I tried Gemma locally on Apple Silicon yesterday — promising model, but Ollama felt like more of a bottleneck than the model itself. I had noticeably better raw performance with mistralrs (i find it on reddit then github), but the coding/tool-use workflow felt weaker. So the tradeoff wasn’t really model quality — it was runtime speed vs workflow maturity. | ||