Remix.run Logo
Curiositry 15 hours ago

This was a breeze to install on Linux. However, I haven't managed to get realtime transcription working yet, ala Whisper.cpp stream or Moonshine.

--from-mic only supports Mac. I'm able to capture audio with ffmpeg, but adapting the ffmpeg example to use mic capture hasn't worked yet:

ffmpeg -f pulse -channels 1 -i 1 -f s16le - 2>/dev/null | ./voxtral -d voxtral-model --stdin

It's possible my system is simply under spec for the default model.

I'd like to be able to use this with the voxtral-q4.gguf quantized model from here: https://huggingface.co/TrevorJS/voxtral-mini-realtime-gguf

jwrallie 13 hours ago | parent | next [-]

I am interested in a way to capture audio not only from the mic, but also from one of the monitor ports so you could pipe the audio you are hearing from the web directly for real-time transcription with one of these solutions. Did anyone manage to do that?

I can, for example, capture audio from that with Audacity or OBS Studio and do it later, so it should be possible to do it in real time too assuming my machine can keep up.

bebna 11 hours ago | parent [-]

Set -i 1 to -i default or to one of your monitors, look them up with pactl list short sources

https://trac.ffmpeg.org/wiki/Capture/PulseAudio

yjftsjthsd-h 14 hours ago | parent | prev [-]

Does it work if you use ffmpeg to feed it audio from a file? I personally would try file->ffmpeg->voxtral then mic->ffmpeg->file, and then try to glue together mic->ffmpeg->voxtral.

(But take with grain of salt; I haven't tried yet)

Curiositry 11 hours ago | parent [-]

Recording audio with FFMPEG, and transcribing a file that’s piped from FFMPEG both work.

Given that it took 19.64 mins to transcribe the 11 second sample wav, it’s possible I just didn’t wait long enough :)

yjftsjthsd-h 10 hours ago | parent [-]

Ah. In that case... Yeah. Is it using GPU, and does the whole model fit in your (V)RAM?

ekianjo 10 hours ago | parent [-]

This is a CPU implementation only.

yjftsjthsd-h 3 hours ago | parent [-]

Oh, that's interesting. The readme talks about GPU acceleration on Apple Silicon and I didn't see anything explicit for other platforms, so I assumed it needs GPU everywhere, but it does BLAS acceleration which a web search seems to agree is just a CPU optimized math library. That's great; should really increase the places where it's useful:)