▲ | brokensegue 3 days ago | |||||||||||||||||||
i asked them why they did this. the answer surprisingly is because they fear if they release the full dumps they will get blocked because of the AI scraping wars. | ||||||||||||||||||||
▲ | cedws 3 days ago | parent | next [-] | |||||||||||||||||||
Feels like a bit of a kick in the teeth that I contributed towards archiving something that I don’t even get access to. What happens if they disappear? The dataset is gone forever. | ||||||||||||||||||||
| ||||||||||||||||||||
▲ | mdaniel 2 days ago | parent | prev | next [-] | |||||||||||||||||||
This whole thread is starting to read like some kind of misguided practical joke. I also recognize that it may seem like this is directed toward you, but I'm not shooting the messenger I'm just anchoring my reply under this new information. Sorry about that. But, ok, let's continue in good faith scenario 1: they don't want to uncork the .warc files because it will potentially leak the means and methods of the Archive Warrior or its usages scenario 2: they don't want to expose the target of the redirects because it will feed the boundaries of the ravenous AI slurp machines If it's scenario 1, then CSV exists and allows mapping from the 00aa11 codes to the "location:" header, no means and methods necessary If it's scenario 2, then what the hell were they expecting to happen? Embargo the .warc until the AI hype blows over so their great grand children can read about how the Internet was back in the day? I guess the real question is "archive for whom?" because right now unless they have a back-channel way to feed the Wayback Machine's boundary using the .warc files, and thus it secretly populates the Wayback without wholesale feeding the AI boundary, this whole thing is just mysterious | ||||||||||||||||||||
| ||||||||||||||||||||
▲ | globular-toast 3 days ago | parent | prev [-] | |||||||||||||||||||
Who fears they will get blocked by whom? | ||||||||||||||||||||
|