It's weird they don't document this stuff. Like understanding things like tool call latency and time to first token is extremely important in application development.

▲

eru 4 days ago | parent [-]

Humans often answer with fluff like "That's a good question, thanks for asking that, [fluff, fluff, fluff]" to give themselves more breathing room until the first 'token' of their real answer. I wonder if any LLM are doing stuff like that for latency hiding?

▲

mips_avatar 4 days ago | parent | next [-]

I don't think the models are doing this, time to first token is more of a hardware thing. But people writing agents are definitely doing this, particularly in voice it's worth it to use a smaller local llm to handle the acknowledgment before handing it off.

▲

strangegecko 4 days ago | parent | prev [-]

Do humans really do that often?

Coming up with all that fluff would keep my brain busy, meaning there's actually no additional breathing room for thinking about an answer.

	▲	eru 4 days ago \| parent [-]
		People who professionally answer questions do that, yes. Eg politicians or press secretaries for companies, or even just your professor taking questions after a talk. > Coming up with all that fluff would keep my brain busy, meaning there's actually no additional breathing room for thinking about an answer. It gets a lot easier with practice: your brain caches a few of the typical fluff routines.