|
| ▲ | PunchyHamster an hour ago | parent | next [-] |
| > Scraping static content from a website at near-zero marginal cost to its server, vs scraping an expensive LLM service provided for free, are different things. I bet people being fucking DDOSed by AI bots disagree Also the fucking ignorance assuming it's "static content" and not something needing code running |
| |
|
| ▲ | mmcwilliams a few seconds ago | parent | prev | next [-] |
| If you assume that the only cost is CPU, maybe, but bandwidth is a completely different issue. T |
|
| ▲ | not2b 2 hours ago | parent | prev | next [-] |
| I understand why OpenAI is trying to reduce its costs, but it simply isn't true that AI crawlers aren't creating very significant load, especially those crawlers that ignore robots.txt and hide their identities. This is direct financial damage and it's particularly hard on nonprofit sites that have been around a long time. |
| |
| ▲ | stingraycharles 18 minutes ago | parent [-] | | These are ChatGPT and Claude Desktop crawlers we’re talking about? Or what is it exactly? Are these really creating significant load while not honoring robots.txt? Genuinely interested. | | |
| ▲ | cruffle_duffle 4 minutes ago | parent [-] | | I bet dollars to doughnuts that 95% of the traffic is from Claude and ChatGPT desktop / mobile and not literal content scraping for training. |
|
|
|
| ▲ | heyethan 18 minutes ago | parent | prev | next [-] |
| I think this also explains why the checks are moving up the stack. If the real cost is in actually running the app or the model, then just verifying a browser isn’t enough anymore. You need to verify that the expensive part actually happened. Otherwise you’re basically protecting the cheapest layer while the expensive one is still exposed. |
|
| ▲ | sandeepkd 23 minutes ago | parent | prev | next [-] |
| Lets not try to qualify the wrongs by picking a metric and evaluating just one side of it. A static website owner could be running with a very small budget and the scraping from bots can bring down their business too. The chances of a static website owner burning through their own life savings are probably higher. |
|
| ▲ | alsetmusic 26 minutes ago | parent | prev | next [-] |
| Have you not seen the multiple posts that have reached the front page of HN with people taking self-hosted Git repos offline or having their personal blogs hammered to hell? Cause if you haven't, they definitely exist and get voted up by the community. |
|
| ▲ | nozzlegear 24 minutes ago | parent | prev | next [-] |
| Are they, actually? |
|
| ▲ | bakugo 3 hours ago | parent | prev | next [-] |
| The cost is so marginal that many, many websites have been forced to add cloudflare captchas or PoW checks before letting anyone access them, because the server would slow to a crawl from 1000 scrapers hitting it at once otherwise. |
|
| ▲ | razingeden 2 hours ago | parent | prev | next [-] |
| It is direct financial damage if my servers not on an unmetered connection — after years of bills coming in around $3/mo I got a surprise >$800 bill on a site nobody on earth appears to care about besides AI scrapers. It hasn’t even been updated in years so hell if I know why it needs to be fetched constantly and aggressively, - but fuck every single one of these companies now whining about bots scraping and victimizing them, here’s my violin. |
|
| ▲ | swagmoney1606 an hour ago | parent | prev | next [-] |
| And yet I have to pay in my time and cash to handle the constant ddos'es from the constant LLM scraping |
|
| ▲ | AtlasBarfed 2 hours ago | parent | prev | next [-] |
| Because you say it is? I obviously disagree. I mean, on top of this we are talking about not-open OpenAI. |
|
| ▲ | karlshea 2 hours ago | parent | prev | next [-] |
| I don’t know what world you live in but it’s not this one. |
|
| ▲ | nslsm 3 hours ago | parent | prev [-] |
| The issue is that there are so many awful webmasters that have websites that take hundreds of milliseconds to generate and are brought down by a couple requests a second. |
| |
| ▲ | bakugo 3 hours ago | parent [-] | | OpenAI must be the most awful webmasters of all, then, to need such sophisticated protections. |
|