Remix.run Logo
jagged-chisel 6 hours ago

Tangent:

I assume this “AI-generated” music is created the same way an LLM generates text: use samples from a corpus strung together into a new [derivative] output.

But it seems plausible that algorithmic generation can be used at any stage of the process. How much disclosure do we (listeners) require? At what point is it unacceptable “AI-generated” music?

The answers are going to be subjective. And human. And dealing with this, I think, is going to take a direction like the “typewriters in college” headline from a few days ago - human involvement, low automation … things that don’t scale.

darth_avocado 5 hours ago | parent | next [-]

> use samples from a corpus strung together into a new [derivative] output.

That’s kind of how the music industry produces music these days. There are a few song writers that write for most artists, music producers who sample other music to string together songs for most artists etc. That’s why most music sounds the same and why AI generated music can be indistinguishable from mainstream music.

taneq 5 hours ago | parent [-]

I mean, it was how Beethoven did it with dice, too. This is just much quicker and more comprehensive.

Kye 5 hours ago | parent | prev [-]

My understanding is music generation is more like stable diffusion. It generates a waveform as an image, then turns it into an audio file.

cubefox 5 hours ago | parent [-]

They do use diffusion models, but I don't think they would make a detour via images. They can just generate audio directly with audio diffusion rather than image diffusion.

corysama 5 hours ago | parent [-]

There technically was one experiment early on to trick Stable Diffusion into generating spectrograms that could be converted into audio. And, it worked surprisingly well.

https://web.archive.org/web/20230314190913/https://www.riffu...

https://huggingface.co/riffusion/riffusion-model-v1

But, I'd expect everything in the past 3 years to diffuse the audio waveform directly.

Kye 4 hours ago | parent [-]

That's probably what I was thinking of. I haven't kept up as much on non-text generative AI.