| ▲ | dalemhurley 11 hours ago |
| Song lyrics. Not illegal. I can google them and see them directly on Google. LLMs refuse. |
|
| ▲ | probably_wrong 9 hours ago | parent | next [-] |
| While the issue is far from settled, OpenAI recently lost a trial in German court regarding their usage of lyrics for training: https://news.ycombinator.com/item?id=45886131 |
| |
| ▲ | observationist 43 minutes ago | parent [-] | | Tell Germany to make their own internet, make their own AI companies, give them a pat on the back, then block the entire EU. Nasty little bureaucratic tyrants. EU needs to get their shit together or they're going to be quibbling over crumbs while the rest of the globe feasts. I'm not inclined to entertain any sort of bailout, either. |
|
|
| ▲ | charcircuit 10 hours ago | parent | prev | next [-] |
| >Not illegal Reproducing a copyrighted work 1:1 is infringing. Other sites on the internet have to license the lyrics before sending them to a user. |
| |
| ▲ | SkyBelow 4 hours ago | parent [-] | | I've asked for non 1:1 versions and have been refused. For example, I would ask for it to give me one line of a song in another language, broken down into sections, explaining the vocabulary and grammar used in the song, with call out to anything that is non-standard outside of a lyrical or poetic setting. Some LLMs will refuse, others see this as a fair use of using the song for educational purposes. So far all I've tried are willing to return a random phrase or grammar used in a song, so it is only getting to asking for a line of lyrics or more that it becomes troublesome. (There is also the problem that the LLMs who do comply will often make up the song unless they have some form of web search and you explicitly tell them to verify the song using it.) | | |
| ▲ | bilbo0s an hour ago | parent [-] | | I would ask for it to give me one line of a song in another language, broken down into sections, explaining the vocabulary and grammar used in the song, with call out to anything that is non-standard outside of a lyrical or poetic setting. I know no one wants to hear this from the cursed IP attorney, but this would be enough to show in court that the song lyrics were used in the training set. So depending on the jurisdiction you're being sued in, there's some liability there. This is usually solved by the model labs getting some kind of licensing agreements in place first and then throwing all that in the training set. Alternatively, they could also set up some kind of RAG workflow where the search goes out and finds the lyrics. But they would have to both know that the found lyrics where genuine, and ensure that they don't save any of that chat for training. At scale, neither of those are trivial problems to solve. Now, how many labs have those agreements in place? Not really sure? But issues such as these are probably why you get silliness like DeepMind models not being licensed for use in the EU for instance. |
|
|
|
| ▲ | sigmoid10 10 hours ago | parent | prev | next [-] |
| It actually works the same as on google. As in, ChatGPT will happily give you a link to a site with the lyrics without issue (regardless whether the third party site provider has any rights or not). But in the search/chat itself, you can only see snippets or small sections, not the entire text. |
| |
| ▲ | hirako2000 5 hours ago | parent [-] | | 1. chatgpt is the publisher, Google is a search engine, links to publishers. 2. LLMs typically don't produce content verbatim. Some LLMs do provide references but it remains a pasta of sentences worded differently. You are asking for gpt to publish verbatim content which may be copyrighted, it would be deemed infringement since non verbatim is already crossing the line. |
|
|
| ▲ | tripzilch 5 hours ago | parent | prev [-] |
| Related, GPT refuses to identify screenshots from movies or TV series. Not for any particular reason, it flat out refuses. I asked it whether it could describe the picture for me in as much detail as possible, and it said it could do that. I asked it whether it could identify a movie or TV series by description of a particular scene, and it said it could do that, but that if I'd ever try or ask it to do both, it wouldn't do that cause it'd be circumvention of its guide lines! -- No it doesn't quite make sense, but to me it does seem quite indicative of a hard-coded limitation/refusal, because it is clearly able to do the sub tasks. I don't think the ability to identify scenes from a movie or TV show is illegal or even immoral, but I can imagine why they would hard code this refusal, because it'd make it easier to show it was trained on copyrighted material? |