Remix.run Logo
simonw 3 hours ago

This demo is really impressive: https://huggingface.co/spaces/mistralai/Voxtral-Mini-Realtim...

Don't be confused if it says "no microphone", the moment you click the record button it will request browser permission and then start working.

I spoke fast and dropped in some jargon and it got it all right - I said this and it transcribed it exactly right, WebAssembly spelling included:

> Can you tell me about RSS and Atom and the role of CSP headers in browser security, especially if you're using WebAssembly?

skykooler 3 minutes ago | parent | next [-]

Doesn't seem to work for me - tried in both Firefox and Chromium and I can see the waveform when I talk but the transcription just shows "Awaiting audio input".

Oras 3 hours ago | parent | prev | next [-]

Thank you for the link! Their playground in Mistral does not have a microphone. it just uploads files, which does not demonstrate the speed and accuracy, but the link you shared does.

I tried speaking in 2 languages at once, and it picked it up correctly. Truly impressive for real-time.

druskacik an hour ago | parent [-]

According to the announcement blog Le Chat is powered by the new model as well: https://chat.mistral.ai/chat

tekacs 3 hours ago | parent | prev | next [-]

Having built with and tried every voice model over the last three years, real time and non-real time... this is off the charts compared to anything I've seen before.

And open weight too! So grateful for this.

daemonologist 3 hours ago | parent | prev | next [-]

404 on https://mistralai-voxtral-mini-realtime.hf.space/gradio_api/... for me (which shows up in the UI as a little red error in the top right).

jaggederest 2 hours ago | parent | prev | next [-]

It can transcribe Eminem's Rap God fast sequence, really, really impressive.

rafram 2 hours ago | parent | next [-]

That's almost certainly in the training data, to be fair.

keeganpoppen 38 minutes ago | parent | prev [-]

what a great test hahah

pyprism 2 hours ago | parent | prev | next [-]

Wow, that’s weird. I tried Bengali, but the text transcribed into Hindi!I know there are some similar words in these languages, but I used pure Bengali that is not similar to Hindi.

derefr 2 hours ago | parent [-]

Well, on the linked page, it mentions "strong transcription performance in 13 languages, including [...] Hindi" but with no mention of Bengali. It probably doesn't know a lick of Bengali, and is just trying to snap your words into the closest language it does know.

keeganpoppen 37 minutes ago | parent [-]

it must have some exposure to bengali— just not enough for them to advertise it. otherwise it would have a damn hard time.

carbocation an hour ago | parent | prev | next [-]

This model was able to transcribe Bad Bunny lyrics over the sound of the background music, played casually from my speakers. Impressive, to me.

sheepscreek an hour ago | parent | prev | next [-]

I’ve been using AquaVoice for real-time transcription for a while now, and it has become a core part of my workflow. It gets everything, jargon, capitalization, everything. Now I’m looking forward to doing that with 100% local inference!

3 hours ago | parent | prev | next [-]
[deleted]
rafram 2 hours ago | parent | prev | next [-]

Not terrible. It missed or mixed up a lot of words when I was speaking quickly (and not enunciating very well), but it does well with normal-paced speech.

th0ma5 3 hours ago | parent | prev | next [-]

[dead]

adarsh2321 2 hours ago | parent | prev | next [-]

[flagged]

adarsh2321 2 hours ago | parent | prev [-]

[flagged]