Remix.run Logo
pmarreck 20 hours ago

Now if it only did separate speaker identification (diarization)

harryf 10 hours ago | parent [-]

It’s fairly easy to get diarizarion working with pyannote.audio and https://huggingface.co/pyannote/speaker-diarization-3.1 with ffmpeg converting the audio first to 16kHz mono WAV file but it really depends a lot on the audio - two person podcast where the speakers allow each other space works but lots of people with overlapping voices on the audio - not so great