Remix clone Hacker News

new | show | ask | jobs Github

▲

observationist 4 hours ago

Native diarization, this looks exciting. edit: or not, no diarization in real-time.

https://huggingface.co/mistralai/Voxtral-Mini-4B-Realtime-26...

~9GB model.

▲

coder543 3 hours ago | parent [-]

The diarization is on Voxtral Mini Transcribe V2, not Voxtral Mini 4B.

▲

sbrother 3 hours ago | parent | next [-]

Do you have experience with that model for diarization? Does it feel accurate, and what's its realtime factor on a typical GPU? Diarization has been the biggest thorn in my side for a long time..

	▲	ashenke 24 minutes ago \| parent \| next [-]
		You can test it yourself for free on https://console.mistral.ai/build/audio/speech-to-text I tried it on an english-speaking podcast episode, and apart from identying one host as two different speakers (but only once for a few sentences at the start), the rest was flawless from what I could see
	▲	coder543 2 hours ago \| parent \| prev [-]
		> Do you have experience with that model No, I just heard about it this morning.

▲

observationist 3 hours ago | parent | prev [-]

Ahh, yeah, and it's explicitly not working for realtime streams. Good catch!