| ▲ | otherme123 3 hours ago | |
They all suck. OpenAI ignores scanning limits and disabled routes in robots.txt, after a 429 "Too Many Requests" they retry the same url half a dozen of times from different IPs in the next couple of minutes, and they once DoS'ed my small VPS trying to do a full scan of sitemaps.xml in less than one hour, trying and retrying if any endpoint failed. Google and others at least respects both robots.txt and 429s. They invested years scanning all the internet, so they can now train on what they have stored in their server. OpenAI seems to assume that MY resources are theirs. | ||