| ▲ | GeekyBear 7 hours ago |
| In previous generations, throughout was excellent for an integrated GPU, but the time to first token was lacking. |
|
| ▲ | danudey 7 hours ago | parent [-] |
| So throughput was already good but TTFT was the metric that needed more improvement? |
| |
| ▲ | zamadatix 6 hours ago | parent | next [-] | | To add to the sibling "good is relative" it also depends what you're running, not just your relative tolerances of what good is. E.g. in a MoE the decode speedup means the speed of prompt processing delay is more noticeable for the same size model in RAM. | |
| ▲ | convenwis 7 hours ago | parent | prev [-] | | Good is relative but first token was clearly the biggest limitation. |
|