Remix.run Logo
GeekyBear 7 hours ago

In previous generations, throughout was excellent for an integrated GPU, but the time to first token was lacking.

danudey 7 hours ago | parent [-]

So throughput was already good but TTFT was the metric that needed more improvement?

zamadatix 6 hours ago | parent | next [-]

To add to the sibling "good is relative" it also depends what you're running, not just your relative tolerances of what good is. E.g. in a MoE the decode speedup means the speed of prompt processing delay is more noticeable for the same size model in RAM.

convenwis 7 hours ago | parent | prev [-]

Good is relative but first token was clearly the biggest limitation.