| ▲ | simonw 3 hours ago | ||||||||||||||||
This demo is really impressive: https://huggingface.co/spaces/mistralai/Voxtral-Mini-Realtim... Don't be confused if it says "no microphone", the moment you click the record button it will request browser permission and then start working. I spoke fast and dropped in some jargon and it got it all right - I said this and it transcribed it exactly right, WebAssembly spelling included: > Can you tell me about RSS and Atom and the role of CSP headers in browser security, especially if you're using WebAssembly? | |||||||||||||||||
| ▲ | skykooler 3 minutes ago | parent | next [-] | ||||||||||||||||
Doesn't seem to work for me - tried in both Firefox and Chromium and I can see the waveform when I talk but the transcription just shows "Awaiting audio input". | |||||||||||||||||
| ▲ | Oras 3 hours ago | parent | prev | next [-] | ||||||||||||||||
Thank you for the link! Their playground in Mistral does not have a microphone. it just uploads files, which does not demonstrate the speed and accuracy, but the link you shared does. I tried speaking in 2 languages at once, and it picked it up correctly. Truly impressive for real-time. | |||||||||||||||||
| |||||||||||||||||
| ▲ | tekacs 3 hours ago | parent | prev | next [-] | ||||||||||||||||
Having built with and tried every voice model over the last three years, real time and non-real time... this is off the charts compared to anything I've seen before. And open weight too! So grateful for this. | |||||||||||||||||
| ▲ | daemonologist 3 hours ago | parent | prev | next [-] | ||||||||||||||||
404 on https://mistralai-voxtral-mini-realtime.hf.space/gradio_api/... for me (which shows up in the UI as a little red error in the top right). | |||||||||||||||||
| ▲ | jaggederest 2 hours ago | parent | prev | next [-] | ||||||||||||||||
It can transcribe Eminem's Rap God fast sequence, really, really impressive. | |||||||||||||||||
| |||||||||||||||||
| ▲ | pyprism 2 hours ago | parent | prev | next [-] | ||||||||||||||||
Wow, that’s weird. I tried Bengali, but the text transcribed into Hindi!I know there are some similar words in these languages, but I used pure Bengali that is not similar to Hindi. | |||||||||||||||||
| |||||||||||||||||
| ▲ | carbocation an hour ago | parent | prev | next [-] | ||||||||||||||||
This model was able to transcribe Bad Bunny lyrics over the sound of the background music, played casually from my speakers. Impressive, to me. | |||||||||||||||||
| ▲ | sheepscreek an hour ago | parent | prev | next [-] | ||||||||||||||||
I’ve been using AquaVoice for real-time transcription for a while now, and it has become a core part of my workflow. It gets everything, jargon, capitalization, everything. Now I’m looking forward to doing that with 100% local inference! | |||||||||||||||||
| ▲ | 3 hours ago | parent | prev | next [-] | ||||||||||||||||
| [deleted] | |||||||||||||||||
| ▲ | rafram 2 hours ago | parent | prev | next [-] | ||||||||||||||||
Not terrible. It missed or mixed up a lot of words when I was speaking quickly (and not enunciating very well), but it does well with normal-paced speech. | |||||||||||||||||
| ▲ | th0ma5 3 hours ago | parent | prev | next [-] | ||||||||||||||||
[dead] | |||||||||||||||||
| ▲ | adarsh2321 2 hours ago | parent | prev | next [-] | ||||||||||||||||
[flagged] | |||||||||||||||||
| ▲ | adarsh2321 2 hours ago | parent | prev [-] | ||||||||||||||||
[flagged] | |||||||||||||||||