Remix.run Logo
trenchpilgrim a day ago

Whisper has quite bad issues with hallucination. It will inject sentences that were never said in the audio.

It's decent for classification but poor at transcription.

neckro23 18 hours ago | parent [-]

Pre-processing with a vocal extraction model (bs-rofomer or similar) helps a lot with the hallucinations, especially with poor quality sources.

trenchpilgrim 17 hours ago | parent [-]

I'm working with fairly "clean" audio (voice only) and still see ridiculous hallucinations.