| ▲ | lukax 3 hours ago | ||||||||||||||||
Or you could use Soniox Real-time (supports 60 languages) which natively supports endpoint detection - the model is trained to figure out when a user's turn ended. This always works better than VAD. https://soniox.com/docs/stt/rt/endpoint-detection Soniox also wins the independent benchmarks done by Daily, the company behind Pipecat. https://www.daily.co/blog/benchmarking-stt-for-voice-agents/ You can try a demo on the home page: Disclaimer: I used to work for Soniox Edit: I commented too soon. I only saw VAD and immediately thought of Soniox which was the first service to implement real time endpoint detection last year. | |||||||||||||||||
| ▲ | nicktikhonov 3 hours ago | parent [-] | ||||||||||||||||
If you read the post, you'll see that I used Deepgram's Flux. It also does endpointing and is a higher-level abstraction than VAD. | |||||||||||||||||
| |||||||||||||||||