Prefill has a lot of parallelism, and so does decode with a larger context (very common with agentic tasks). People like to say "old inference chips are no good for LLM use" but that's not really true.