Remix clone Hacker News

new | show | ask | jobs Github

	▲	behnamoh 7 hours ago
		> That said, faster inference can't come soon enough. why is that? technical limits? I know cerebras struggles with compute and they stopped their coding plan (sold out!). their arch also hasn't been used with large models like gpt-5.2. the largest they support (if not quantized) is glm 4.7 which is <500B params.