| ▲ | yjftsjthsd-h 14 hours ago |
| Does it work if you use ffmpeg to feed it audio from a file? I personally would try file->ffmpeg->voxtral then mic->ffmpeg->file, and then try to glue together mic->ffmpeg->voxtral. (But take with grain of salt; I haven't tried yet) |
|
| ▲ | Curiositry 11 hours ago | parent [-] |
| Recording audio with FFMPEG, and transcribing a file that’s piped from FFMPEG both work. Given that it took 19.64 mins to transcribe the 11 second sample wav, it’s possible I just didn’t wait long enough :) |
| |
| ▲ | yjftsjthsd-h 10 hours ago | parent [-] | | Ah. In that case... Yeah. Is it using GPU, and does the whole model fit in your (V)RAM? | | |
| ▲ | ekianjo 10 hours ago | parent [-] | | This is a CPU implementation only. | | |
| ▲ | yjftsjthsd-h 3 hours ago | parent [-] | | Oh, that's interesting. The readme talks about GPU acceleration on Apple Silicon and I didn't see anything explicit for other platforms, so I assumed it needs GPU everywhere, but it does BLAS acceleration which a web search seems to agree is just a CPU optimized math library. That's great; should really increase the places where it's useful:) |
|
|
|