Remix.run Logo
oidar 4 days ago

What's the best solution right now for TTS that supports speaker diarisation?

makaimc 4 days ago | parent | next [-]

AssemblyAI (YC S17) is currently the one that stands out in the WER and accuracy benchmarks (https://www.assemblyai.com/benchmarks). Though its models are accessed through a web API rather than locally hosted, and speaker diarization is enabled through a parameter in the API call (https://www.assemblyai.com/docs/speech-to-text/pre-recorded-...).

xnx 4 days ago | parent | prev [-]

I like this version of Whisper which has diarization built in: https://github.com/Purfview/whisper-standalone-win