Remix clone Hacker News

new | show | ask | jobs Github

	▲	petercooper 3 hours ago
		So prompt goes in 4x as fast but generates tokens slower. I'd take that tradeoff. On my M3 Ultra, the inference is surprisingly fast, but the prompt processing speed makes it painful except as a fallback or experimentation, especially with agentic coding tools.