| ▲ | danudey 7 hours ago | |
So throughput was already good but TTFT was the metric that needed more improvement? | ||
| ▲ | zamadatix 6 hours ago | parent | next [-] | |
To add to the sibling "good is relative" it also depends what you're running, not just your relative tolerances of what good is. E.g. in a MoE the decode speedup means the speed of prompt processing delay is more noticeable for the same size model in RAM. | ||
| ▲ | convenwis 7 hours ago | parent | prev [-] | |
Good is relative but first token was clearly the biggest limitation. | ||