| ▲ | mips_avatar 5 days ago | ||||||||||||||||||||||
It's weird they don't document this stuff. Like understanding things like tool call latency and time to first token is extremely important in application development. | |||||||||||||||||||||||
| ▲ | eru 4 days ago | parent [-] | ||||||||||||||||||||||
Humans often answer with fluff like "That's a good question, thanks for asking that, [fluff, fluff, fluff]" to give themselves more breathing room until the first 'token' of their real answer. I wonder if any LLM are doing stuff like that for latency hiding? | |||||||||||||||||||||||
| |||||||||||||||||||||||