| ▲ | mips_avatar 4 days ago | |
I don't think the models are doing this, time to first token is more of a hardware thing. But people writing agents are definitely doing this, particularly in voice it's worth it to use a smaller local llm to handle the acknowledgment before handing it off. | ||