| ▲ | supernes 6 hours ago | |
This approach seems kind of backwards to me. Why try to detect everything except the thing you're trying to remove instead of either sampling a few uhs and ums and treating them as noise to be silenced (with a sharp crossfade to the noise floor that doesn't interrupt speech flow) or finetuning a model to detect them specifically for full automation? | ||