Remix clone Hacker News

new | show | ask | jobs Github

	▲	heroprotagonist 8 hours ago
		Not to promote something, but Wispr Flow does that for me automatically if I trigger a setting for it.. While it's a commercial product with a subscription, I spent a long time on the free tier not even hitting their limits until I started using it so extensively that I wanted to pay for it. And I've used Whisper in the past, mostly for tinkering. I tried it for a couple of use cases but haven't touched the base project in a while. But I do regularly use Faster-Whisper-XXL, an open source project based on Whisper, for subtitle generation. Though, for subtitle generation, I decided to support the project and mainly use the non-public build of Faster-Whisper-XXL Pro built for donators to the open source project. The extra features smooth out the subtitle editing process very substantially. Toss in "--roformer_overlap 0.125 --roformer_vram 16 --best_of 15 --ff_vocal_extract mb-roformer --vad_method pyannote_v3" to the cli parameters (and sometimes --realign) and you have much less work to do in SubtitleEdit or Tero Subtitler afterwards to clean it up.
	▲	iib 2 hours ago \| parent \| next [-]
		Surprisingly, it's the whisper model itself that does that. I find that it's also good with false starts, often correcting something like: "uhm, we could...we can go there" to just "we can go there", if spoken rapidly enough.
	▲	dotancohen 4 hours ago \| parent \| prev [-]
		Is love to hear more about subtitle generation. Specifically, can you label different speakers? I'd be using this for meeting transcription. Thank you.