Remix.run Logo
andsoitis 10 hours ago

This article is from 2024 and points to Kaggle, which hosts the data set.

I'm surprised that JKR's people haven't come down like a tonne of bricks on Kaggle / Microsoft.

Does anyone know whether there is some special reason why this has lasted so long without being taken down?

anonymous908213 9 hours ago | parent | next [-]

My best guess is that it flew under the radar. The Kaggle dataset has 'only' 10,000 downloads, and the article itself probably doesn't have that many views. Still, this seems pretty far beyond the pale. Given the other case of AI-related plagiarism by Microsoft that was on the front page[1], it seems whatever review process they have for content that is published by their employees, if there is any review process at all, is deeply flawed.

[1] https://news.ycombinator.com/item?id=47057829, "Microsoft morged my diagram". It was in a discussion there that someone pointed out this article linking to full downloads of the Harry Potter novels, which I thought deserved more visibility.

zythyx 9 hours ago | parent | next [-]

Also, I imagine that most of those 10k downloads are probably from AI trainers that are just speed running through Kaggle to obtain absolutely anything to train their AI. There are definitely other, more 'known' ways to obtain these books without finding them as random text files in an AI dataset operation

selridge 9 hours ago | parent | prev [-]

Why did you think that?

anonymous908213 9 hours ago | parent [-]

It rubs me the wrong way that corporations get a free pass on copyright infrigement, while the rest of us are prosecuted as harshly as possible if caught. I think this, together with the morging plagiarism, also indicates a pattern of behaviour from Microsoft that should be reformed. I would prefer if Microsoft were not able to produce AI slop degradations of other people's work and claim it as their own.

walletdrainer 7 hours ago | parent | next [-]

> while the rest of us are prosecuted as harshly as possible if caught

But this is just a lie.

Approximately nobody is prosecuted for copyright infringement.

queenkjuul 3 hours ago | parent [-]

Okay but people have had their lives ruined deliberately by media companies over it. I'm sure you knew what they meant.

walletdrainer 3 hours ago | parent [-]

No matter how generously you want to interpret it, it’s obviously false.

We’re moving the goalposts from the government systematically targeting normal people “if caught”, to only a handful of civil cases.

eggsome 2 hours ago | parent [-]

Sure, as a percentage it's very rare - but some people have died as a result: https://en.wikipedia.org/wiki/Aaron_Swartz

I think most would agree that cases like that act as a deterrent?

walletdrainer 19 minutes ago | parent [-]

That’s not even more than tangentially copyright-related?

> I think most would agree that cases like that act as a deterrent?

I think we could hardly get any further from “the rest of us are prosecuted as harshly as possible if caught”.

ryandrake 8 hours ago | parent | prev [-]

In general, if you want to get away with a crime, just do it as a corporation or as a billionaire.

blibble 8 hours ago | parent | prev [-]

brb poking Rowling on twitter

(done, contacted her lawyers too)

k__o 6 hours ago | parent | next [-]

make sure u worded it right or she'll block you

6 hours ago | parent | prev [-]
[deleted]