| ▲ | dnackoul 3 hours ago | |
I've generally observed latency of 500ms to 1s with modern LLM-based voice agents making real calls. That's good enough to have real conversations. I attended VAPI Con earlier this year, and a lot of the discussion centered on how interruptions and turn detection are the next frontier in making voice agents smoother conversationalists. Knowing when to speak is a hard problem even for humans, but when you listen to a lot of voice agent calls, the friction point right now tends to be either interrupting too often or waiting too long to respond. The major players are clearly working on this. Deepgram announced a new SOTA (Flux) for turn detection at the conference. Feels like an area where we'll see even more progress in the next year. | ||