Remix.run Logo
supernes 6 hours ago

This approach seems kind of backwards to me. Why try to detect everything except the thing you're trying to remove instead of either sampling a few uhs and ums and treating them as noise to be silenced (with a sharp crossfade to the noise floor that doesn't interrupt speech flow) or finetuning a model to detect them specifically for full automation?