I've heard that, but that doesn't sound like a useful approach for videos where (1) non-speech segments can have plenty of other sound (music, noise) and (2) you want timestamps to match up with the original video, like for subtitles. But maybe there are known mitigations for both of those issues that I'm not aware of. And if they do exist maybe they can be included in the ffmpeg whisper integration.

▲

miki123211 a day ago | parent [-]

By "delete", people mostly mean "detect", so that you can avoid processing such segments through Whisper. There's no reason to actually cut the silence out from the original audio file.

	▲	21 hours ago \| parent [-]
		[deleted]