Remix.run Logo
blt 8 hours ago

What makes this different from linking to a random zip file somewhere?

zythyx 8 hours ago | parent | next [-]

Microsoft could have used any dataset for their blog, they could have even chosen to use actual public domain novels. Instead, they opted to use copywritten works that JK hasn't released into the public domain (unless user "Shubham Maindola" is JK's alter ego).

bossyTeacher 6 hours ago | parent [-]

Rowling is known for using pseudonyms. Maybe she got tired of writing and decided to break into LLM tech.

Lerc 8 hours ago | parent | prev | next [-]

The licence?

If it comes from a site claiming it was under a licence when it was not, the misdeed is done by the person who provided the version carrying the licence.

wongarsu 7 hours ago | parent | next [-]

Just because it says "CC0" does not make it CC0. If you upload a dataset you don't have the rights to, any license declaration you make is null and void, and anyone using it as if it had that license is violating copyright

Even if MS could claim that they were acting in good faith there really isn't much legal wiggle room for that. But it doesn't even come to that because I don't think anyone would buy that they really thought that the Harry Potter books were under the CC0

noosphr 2 hours ago | parent [-]

If you buy a pirated book on Amazon you get to keep the book and the pirate printer is the one persecuted.

Same thing applies here.

Up to 80% off all works that are in copyright terms are accidentally in the public domain. A well known example is Night of the Living Dead. It is not your job to check that the copiright on a work you use is the correct one.

nhinck2 an hour ago | parent [-]

The only reason you get to keep the book is because no bothers to enforce the law, this doesn't make it legal.

And it is your job to check that you have the rights to use other people's work. Ignorance is not a defence.

ribosometronome 31 minutes ago | parent [-]

>the law

Which ones? As far as I was aware, it's a crime to redistribute copyrighted works, not receive.

slopinthebag 7 hours ago | parent | prev [-]

Oh come on. The licence was obviously incorrect and you cant escape culpability because of that.

fxwin 8 hours ago | parent | prev | next [-]

The licensing: If I steal something and tell you its free and yours for the taking, that feels different than a Fence (knowingly) buying stolen goods. It's obviously semantics and there should have been some better judgemend from MS, but downloading a dataset (stated as public domain) from kaggle feels spiritually different from piracy (e.g.: if someone uploads a less known, copyrighted data set to kaggle/huggingface under an incorrect license, are tutorials that use this data set a 'guide to pirating' this data set? To me, that feels like a wrong use of the term)

philipwhiuk 6 hours ago | parent | prev [-]

The 'artwork' they generated and the text on the blog post?