Remix clone Hacker News

new | show | ask | jobs Github

	▲	terhechte 5 hours ago
		Thank you for NeoVim! I also use it every day, mostly for thinking / text / markdown though these days. Have you compared against MLX? Sometimes I’m getting much faster responses but it feels like the quality is worse (eg tool calls not working, etc)
	▲	tarruda 4 hours ago \| parent [-]
		> Have you compared against MLX? I don't think MLX supports similar 2-bit quants, so I never tried 397B with MLX. However I did try 4-bit MLX with other Qwen 3.5 models and yes it is significantly faster. I still prefer llama.cpp due to it being a one in all package: - SOTA dynamic quants (especially ik_llama.cpp) - amazing web ui with MCP support - anthropic/openai compatible endpoints (means it can be used with virtually any harness) - JSON constrained output which basically ensures tool call correctness. - routing mode