Remix.run Logo
heroprotagonist 8 hours ago

Not to promote something, but Wispr Flow does that for me automatically if I trigger a setting for it..

While it's a commercial product with a subscription, I spent a long time on the free tier not even hitting their limits until I started using it so extensively that I wanted to pay for it.

And I've used Whisper in the past, mostly for tinkering. I tried it for a couple of use cases but haven't touched the base project in a while. But I do regularly use Faster-Whisper-XXL, an open source project based on Whisper, for subtitle generation.

Though, for subtitle generation, I decided to support the project and mainly use the non-public build of Faster-Whisper-XXL Pro built for donators to the open source project.

The extra features smooth out the subtitle editing process very substantially. Toss in "--roformer_overlap 0.125 --roformer_vram 16 --best_of 15 --ff_vocal_extract mb-roformer --vad_method pyannote_v3" to the cli parameters (and sometimes --realign) and you have much less work to do in SubtitleEdit or Tero Subtitler afterwards to clean it up.

iib 2 hours ago | parent | next [-]

Surprisingly, it's the whisper model itself that does that. I find that it's also good with false starts, often correcting something like: "uhm, we could...we can go there" to just "we can go there", if spoken rapidly enough.

dotancohen 4 hours ago | parent | prev [-]

Is love to hear more about subtitle generation. Specifically, can you label different speakers? I'd be using this for meeting transcription. Thank you.