| ▲ | Kye 5 hours ago | ||||||||||||||||
My understanding is music generation is more like stable diffusion. It generates a waveform as an image, then turns it into an audio file. | |||||||||||||||||
| ▲ | cubefox 5 hours ago | parent [-] | ||||||||||||||||
They do use diffusion models, but I don't think they would make a detour via images. They can just generate audio directly with audio diffusion rather than image diffusion. | |||||||||||||||||
| |||||||||||||||||