| ▲ | SkyBelow 4 hours ago | |
I've asked for non 1:1 versions and have been refused. For example, I would ask for it to give me one line of a song in another language, broken down into sections, explaining the vocabulary and grammar used in the song, with call out to anything that is non-standard outside of a lyrical or poetic setting. Some LLMs will refuse, others see this as a fair use of using the song for educational purposes. So far all I've tried are willing to return a random phrase or grammar used in a song, so it is only getting to asking for a line of lyrics or more that it becomes troublesome. (There is also the problem that the LLMs who do comply will often make up the song unless they have some form of web search and you explicitly tell them to verify the song using it.) | ||
| ▲ | bilbo0s an hour ago | parent [-] | |
I would ask for it to give me one line of a song in another language, broken down into sections, explaining the vocabulary and grammar used in the song, with call out to anything that is non-standard outside of a lyrical or poetic setting. I know no one wants to hear this from the cursed IP attorney, but this would be enough to show in court that the song lyrics were used in the training set. So depending on the jurisdiction you're being sued in, there's some liability there. This is usually solved by the model labs getting some kind of licensing agreements in place first and then throwing all that in the training set. Alternatively, they could also set up some kind of RAG workflow where the search goes out and finds the lyrics. But they would have to both know that the found lyrics where genuine, and ensure that they don't save any of that chat for training. At scale, neither of those are trivial problems to solve. Now, how many labs have those agreements in place? Not really sure? But issues such as these are probably why you get silliness like DeepMind models not being licensed for use in the EU for instance. | ||