This has more to do with Voice Activity Detection (VAD) than the latency described in the article

lxgr 2 hours ago | parent | next [-]

That seems to be the issue: VAD is insufficient here.

Knowing when to respond requires semantic understanding, which probably only the model itself is capable enough.

Maybe it’s hard for them to train it to only respond once it seems appropriate to do so?

	▲	Sean-Der an hour ago \| parent [-]
		I am excited for VAD to go away. PersonaPlex totally seems like the future. However things like 'Call center helpline' turn based actually seems better! You don't want to be interrupted when giving information back and forth (I think?)

wnmurphy 2 hours ago | parent | prev [-]

Exactly. It's a tangent, but clearly a pain point for enough users.