Remix clone Hacker News

new | show | ask | jobs Github

	▲	recursive 6 hours ago
		I've noticed this too. I have a few theories about this. Disclosure: I know a little about audio, and very little about audio generative AI. First, perhaps the models are trained on relatively low-bitrate encodings. Just like image generations sometimes generate JPG artifacts, we could be hearing the known high-frequency loss of low data rate encodings. Another idea is that 'S' and 'T' sounds and similar are relatively broad-spectrum sounds. Not unlike white noise. That kind of sound is known to be difficult to encode for lossy frequency-domain encoding schemes. Perhaps these models work in a similar domain and are subject to similar constraints. Perhaps there's a balance here of low-pass filter vs. "warbly" sounds, and we're hearing a middle ground compromise. I don't know how it happens, but when I hear the "AI" sound in music, this is usually one of the first tells.