Remix.run Logo
VanTheBrand a day ago

The metadata is probably more useful than the music files themselves arguably

vintermann 21 hours ago | parent | next [-]

Self-supplied metadata in music catalogs is notoriously shit. The degree to which most rights owners don't give a damn is telling.

Spotify's own metadata is not particularly sophisticated. "Valence", "Energy", "Danceability", etc. You can see from a mile away that these are assigned names to PCA axes which actually correspond pretty poorly to musical concepts, because whatever they analyzed isn't nicely linearly separable.

cm2012 a day ago | parent | prev [-]

Especially since they scraped Spotify's popularity rating as well

input_sh a day ago | parent [-]

I can't think of many situations where that would be particularly valuable, considering it favours recent plays and the cutoff date is already almost half a year old.

cm2012 a day ago | parent | next [-]

Helps train an algorithm to figure out which music is popular, as a training signal

skrtskrt a day ago | parent | prev [-]

If that's all the issues there are with the dataset, it is probably far and away the best dataset any researcher has ever used.