| ▲ | martinald an hour ago | |
Yes potentially - but the OG TPUs were actually very poorly suited for LLM usage - designed for far smaller models with more parallelism in execution. They've obviously adapted the design but it's a risk optimising in hardware like that - if there is another model architecture jump the risk of having a narrow specialised set of hardware means you can't generalise enough. | ||
| ▲ | zozbot234 an hour ago | parent [-] | |
Prefill has a lot of parallelism, and so does decode with a larger context (very common with agentic tasks). People like to say "old inference chips are no good for LLM use" but that's not really true. | ||