| ▲ | dust42 6 hours ago |
| Low latency inference is very useful in voice-to-voice applications. You say it is a waste of power but at least their claim is that it is 10x more efficient. We'll see but if it works out it will definitely find its applications. |
|
| ▲ | zozbot234 6 hours ago | parent [-] |
| This is not voice-to-voice though, end-to-end voice chat models (the Her UX) are completely different. |
| |
| ▲ | dust42 6 hours ago | parent [-] | | I haven't found any end-to-end voice chat models useful. I had much better results with separate STT-LLM-TTS. One big problem is the turn detection and having inference with 150-200ms latency would allow for a whole new level of quality. I would just use it with a prompt: "You think the user is finished talking?" and then push it to a larger model. The AI should reply within the ballpark of 600ms-1000ms. Faster is often irritating, slower will make the user to start talking again. |
|