Remix clone Hacker News

new | show | ask | jobs Github

	▲	daemonologist 11 hours ago
		It's interesting to me that all AI music sounds slightly sibilant - like someone taped a sheet of paper to the speaker or covered my head in dry leaves. I know no model is perfect but I'd have thought they'd have ironed out this problem by now, given how pervasive it is and how significantly it degrades the end product.
	▲	recursive 6 hours ago \| parent \| next [-]
		I've noticed this too. I have a few theories about this. Disclosure: I know a little about audio, and very little about audio generative AI. First, perhaps the models are trained on relatively low-bitrate encodings. Just like image generations sometimes generate JPG artifacts, we could be hearing the known high-frequency loss of low data rate encodings. Another idea is that 'S' and 'T' sounds and similar are relatively broad-spectrum sounds. Not unlike white noise. That kind of sound is known to be difficult to encode for lossy frequency-domain encoding schemes. Perhaps these models work in a similar domain and are subject to similar constraints. Perhaps there's a balance here of low-pass filter vs. "warbly" sounds, and we're hearing a middle ground compromise. I don't know how it happens, but when I hear the "AI" sound in music, this is usually one of the first tells.
	▲	userbinator 2 hours ago \| parent \| prev \| next [-]
		Perhaps this is what the human is for - to apply an EQ curve.
	▲	AlphaAndOmega0 11 hours ago \| parent \| prev \| next [-]
		Agreed. I find that particularly annoying, and I also seem to find that the spatial arrangement or stereo effect is muted for most instruments (or the model simply doesn't use that feature as well as a good human musician).
	▲	gowld 3 hours ago \| parent \| prev [-]
		I suspect it's because AI generates music as a waveform incrementally not globally so it favors smoothly varying sounds, not sharp contrast. If it generated MIDI data and then used a MIDI synth to create the audio, you wouldn't get that.