| ▲ | eru 4 days ago | |||||||
Humans often answer with fluff like "That's a good question, thanks for asking that, [fluff, fluff, fluff]" to give themselves more breathing room until the first 'token' of their real answer. I wonder if any LLM are doing stuff like that for latency hiding? | ||||||||
| ▲ | mips_avatar 4 days ago | parent | next [-] | |||||||
I don't think the models are doing this, time to first token is more of a hardware thing. But people writing agents are definitely doing this, particularly in voice it's worth it to use a smaller local llm to handle the acknowledgment before handing it off. | ||||||||
| ▲ | strangegecko 4 days ago | parent | prev [-] | |||||||
Do humans really do that often? Coming up with all that fluff would keep my brain busy, meaning there's actually no additional breathing room for thinking about an answer. | ||||||||
| ||||||||