Remix clone Hacker News

new | show | ask | jobs Github

	▲	Schekin a day ago
		This matches my experience. The weights usually arrive before the runtime stack fully catches up. I tried Gemma locally on Apple Silicon yesterday — promising model, but Ollama felt like more of a bottleneck than the model itself. I had noticeably better raw performance with mistralrs (i find it on reddit then github), but the coding/tool-use workflow felt weaker. So the tradeoff wasn’t really model quality — it was runtime speed vs workflow maturity.