Remix clone Hacker News

new | show | ask | jobs Github

	▲	zozbot234 an hour ago
		Prefill has a lot of parallelism, and so does decode with a larger context (very common with agentic tasks). People like to say "old inference chips are no good for LLM use" but that's not really true.