| ▲ | daemonologist 11 hours ago | |
It's interesting to me that all AI music sounds slightly sibilant - like someone taped a sheet of paper to the speaker or covered my head in dry leaves. I know no model is perfect but I'd have thought they'd have ironed out this problem by now, given how pervasive it is and how significantly it degrades the end product. | ||
| ▲ | recursive 6 hours ago | parent | next [-] | |
I've noticed this too. I have a few theories about this. Disclosure: I know a little about audio, and very little about audio generative AI. First, perhaps the models are trained on relatively low-bitrate encodings. Just like image generations sometimes generate JPG artifacts, we could be hearing the known high-frequency loss of low data rate encodings. Another idea is that 'S' and 'T' sounds and similar are relatively broad-spectrum sounds. Not unlike white noise. That kind of sound is known to be difficult to encode for lossy frequency-domain encoding schemes. Perhaps these models work in a similar domain and are subject to similar constraints. Perhaps there's a balance here of low-pass filter vs. "warbly" sounds, and we're hearing a middle ground compromise. I don't know how it happens, but when I hear the "AI" sound in music, this is usually one of the first tells. | ||
| ▲ | userbinator 2 hours ago | parent | prev | next [-] | |
Perhaps this is what the human is for - to apply an EQ curve. | ||
| ▲ | AlphaAndOmega0 11 hours ago | parent | prev | next [-] | |
Agreed. I find that particularly annoying, and I also seem to find that the spatial arrangement or stereo effect is muted for most instruments (or the model simply doesn't use that feature as well as a good human musician). | ||
| ▲ | gowld 3 hours ago | parent | prev [-] | |
I suspect it's because AI generates music as a waveform incrementally not globally so it favors smoothly varying sounds, not sharp contrast. If it generated MIDI data and then used a MIDI synth to create the audio, you wouldn't get that. | ||