You usually delete silence before using something like whisper.

re a day ago | parent | next [-]

I've heard that, but that doesn't sound like a useful approach for videos where (1) non-speech segments can have plenty of other sound (music, noise) and (2) you want timestamps to match up with the original video, like for subtitles. But maybe there are known mitigations for both of those issues that I'm not aware of. And if they do exist maybe they can be included in the ffmpeg whisper integration.

▲

miki123211 a day ago | parent [-]

By "delete", people mostly mean "detect", so that you can avoid processing such segments through Whisper. There's no reason to actually cut the silence out from the original audio file.

	▲	21 hours ago \| parent [-]
		[deleted]

▲

hnlmorg a day ago | parent | prev [-]

This is designed for real time use too. And in such cases, you couldn’t delete the silence before use.

	▲	42lux a day ago \| parent [-]
		The ffmpeg implementation might be the example was not.