Remix.run Logo
mdrzn 3 hours ago

There's no comparison to Whisper Large v3 or other Whisper models..

Is it better? Worse? Why do they only compare to gpt4o mini transcribe?

tekacs 3 hours ago | parent | next [-]

WER is slightly misleading, but Whisper Large v3 WER is classically around 10%, I think, and 12% with Turbo.

The thing that makes it particularly misleading is that models that do transcription to lowercase and then use inverse text normalization to restore structure and grammar end up making a very different class of mistakes than Whisper, which goes directly to final form text including punctuation and quotes and tone.

But nonetheless, they're claiming such a lower error rate than Whisper that it's almost not in the same bucket.

tekacs 3 hours ago | parent [-]

On the topic of things being misleading, GPT-4o transcriber is a very _different_ transcriber to Whisper. I would say not better or worse, despite characterizations such. So it is a little difficult to compare on just the numbers.

There's a reason that quite a lot of good transcribers still use V2, not V3.

satvikpendem 3 hours ago | parent [-]

Different how?

GaggiX 3 hours ago | parent | prev [-]

Gpt4o mini transcribe is better and actually realtime. Whisper is trained to encode the entire audio (or at least 30s chunks) and then decode it.

mdrzn 3 hours ago | parent | next [-]

So "gpt4o mini transcribe" is not just whisper v3 under the hood? Btw it's $0.006 / minute

For Whisper API online (with v3 large) I've found "$0.00125 per compute second" which is the cheapest absolute I've ever found.

breisa 22 minutes ago | parent | next [-]

Deepinfra offers Whisper V3 at 0.00045$ / minute of transcribed audio.

24 minutes ago | parent | prev | next [-]
[deleted]
GaggiX 3 hours ago | parent | prev [-]

>So it's not just whisper v3 under the hood?

Why it should be Whisper v3? They even released an open model: https://huggingface.co/mistralai/Voxtral-Mini-4B-Realtime-26...

emmettm 3 hours ago | parent | prev [-]

The linked article claims the average word error rate for Voxtral mini v2 is lower than GPT-4o mini transcribe

GaggiX 3 hours ago | parent [-]

Gpt4o mini transcribe is better than whisper, the context is the parent comment.