▲ | simonw 5 days ago | ||||||||||||||||
I agree that AI companies do all sorts of shady stuff to accumulate training data. See Anthropic's recent lawsuit which I covered here: https://simonwillison.net/2025/Jun/24/anthropic-training/ That's why I care so much about differentiating between the shady stuff that they DO and the stuff that they don't. Saying "we will obey your robots.txt file" and lying about it is a different category of shady. I care about that difference. | |||||||||||||||||
▲ | simoncion 2 days ago | parent | next [-] | ||||||||||||||||
> That's why I care so much about differentiating between the shady stuff that they DO and the stuff that they don't. Ah, good. So you have solid evidence that they're NOT doing shady stuff. Great! Let's have it. "It's unfair to require me to prove a negative!" you say? Sure, that's a fair objection... but my counter to that is "We'll only get solid evidence of dirty dealings if an insider turns stool pidgeon.". So, given that we're certainly not going to get solid evidence, we must base our evaluation on the behavior of the companies in other big projects. Over the past few decades, Google, Facebook, and Microsoft have not demonstrated that they're dedicated to behaving ethically. (And their behavior has gotten far, far worse over the past few years.) OpenAI's CEO is plainly and obviously a manipulator and savvy political operator. (Remember how he once declared that it was vitally important that he could be fired?) Anthropic's CEO just keeps lying to the press [0] in order to keep fueling AGI hype. [0] Oh, pardon me. He's "making a large volume of forward-looking statements that -due to ever-evolving market conditions- turn out to be inaccurate". I often get that concept confused with "lying". My bad. | |||||||||||||||||
| |||||||||||||||||
▲ | cyphar 5 days ago | parent | prev [-] | ||||||||||||||||
Maybe I'm the outlier here, but I think intentionally torrenting millions of books and taking great pains to try to avoid linking the activity to your company is far beyond something as "trivial" as ignoring robots.txt. This is like wringing your hands over whether a serial killer also jaywalked on their way to the crimescene. (In theory the former is supposed to be a capital-C criminal offence -- felony copyright infringement.) |