Remix clone Hacker News

new | show | ask | jobs Github

	▲	martinald an hour ago
		Yes potentially - but the OG TPUs were actually very poorly suited for LLM usage - designed for far smaller models with more parallelism in execution. They've obviously adapted the design but it's a risk optimising in hardware like that - if there is another model architecture jump the risk of having a narrow specialised set of hardware means you can't generalise enough.
	▲	zozbot234 an hour ago \| parent [-]
		Prefill has a lot of parallelism, and so does decode with a larger context (very common with agentic tasks). People like to say "old inference chips are no good for LLM use" but that's not really true.