| ▲ | horsawlarway 2 days ago |
| I really don't think this holds. It's vanishingly rare to end up in a spot where your site is getting enough LLM driven traffic for you to really notice (and I'm not talking out my ass - I host several sites from personal hardware running in my basement). Bots are a thing. Bots have been a thing and will continue to be a thing. They mostly aren't worth worrying about, and at least for now you can throw PoW in front of your site if you are suddenly getting enough traffic from them to care. In the mean time... Your bowl of candy is still there. Still full of your candy for real people to read. That's the fun of digital goods... They aren't "exhaustible" like your candy bowl. No LLM is dumping your whole bowl (they can't). At most - they're just making the line to access it longer. |
|
| ▲ | shiomiru 2 days ago | parent | next [-] |
| > They mostly aren't worth worrying about Well, a common pattern I've lately been seeing is: * Website goes down/barely accessible * Webmaster posts "sorry we're down, LLM scrapers are DoSing us" * Website accessible again, but now you need JS-enabled whatever the god of the underworld is testing this week with to access it. (Alternatively, the operator decides it's not worth the trouble and the website shuts down.) So I don't think your experience about LLM scrapers "not mattering" generalizes well. |
| |
| ▲ | horsawlarway 2 days ago | parent [-] | | Nah - it generalizes fine. They're doing exactly what I said - adding PoW (anubis - as you point out - being one solution) to gate access. That's hardly different than things like Captchas which were a big thing even before LLMs, and also required javascript. Frankly - I'd much rather have people put Anubis in front of the site than cloudflare, as an aside. If the site really was static before, and no JS was needed - LLM scraping taking it down means it was incredibly misconfigured (an rpi can do thousands of reqs/s for static content, and caching is your friend). --- Another great solution? Just ask users to login (no js needed). I'll stand pretty firmly behind "If you aren't willing to make an account - you don't actually care about the site". My take is that search engines and sites generating revenue through ads are the most impacted. I just don't have all that much sympathy for either. Functionally - I think trying to draw a distinction between accessing a site directly and using a tool like an LLM to access a site is a mistake. Like - this was literally the mission statement of the semantic web: "unleash the computer on your behalf to interact with other computers". It just turns out we got there by letting computers deal with unstructured data, instead of making all the data structured. | | |
| ▲ | krupan 2 days ago | parent | next [-] | | "this was literally the mission statement of the semantic web" which most everyone either ignored or outright rejected, but thanks for forcing it on us anyway? | | |
| ▲ | horsawlarway 2 days ago | parent [-] | | I guess if my options for getting a ramen recipe are - Search for it and randomly click on SEO spam articles all over the place, riddled with ads, scrolling 10,000 lines down to see a generally pretty uninspired recipe or - Use an LLM and get a pretty uninspired recipe I don't really see much difference. And we were already well past the days where I got anything other than the first option using the web. There was a brief window were intentionally searching specific sites like reddit/hn worked, but even that's been gone for a couple years now. The best recipe is going to be the one you get from your friends/family/neighbors anyways. And at least on the LLM side - I can run it locally and peg it to a version without ads. | | |
| ▲ | w00ds 2 days ago | parent [-] | | It's crazy how appealing the irl version you mentioned is, compared to the online version. Looking through a book, meeting people and sharing recipes, etc. The world you're interacting with actually cares about you.
Feels like the net can't ever have that now. |
|
| |
| ▲ | shiomiru 2 days ago | parent | prev | next [-] | | > If the site really was static before, and no JS was needed One does not imply the other. This forum is one example. (Or rather, hn.js is entirely optional.) > Another great solution? Just ask users to login (no js needed). I'll stand pretty firmly behind "If you aren't willing to make an account - you don't actually care about the site". Accounts don't make sense for all websites. Self-hosted git repositories are one common case where I now have to wait seconds for my phone to burn through enough sha256 to see a readme - but surely you don't want to gate that behind a login either... > My take is that search engines and sites generating revenue through ads are the most impacted. I just don't have all that much sympathy for either. ...and hobbyist services. If we're sticking with Anubis as an example, consider the author's motivation for developing it: > A majority of the AI scrapers are not well-behaved, and they will ignore your robots.txt, ignore your User-Agent blocks, and ignore your X-Robots-Tag headers. They will scrape your site until it falls over, and then they will scrape it some more. They will click every link on every link on every link viewing the same pages over and over and over and over. Some of them will even click on the same link multiple times in the same second. It's madness and unsustainable. https://xeiaso.net/blog/2025/anubis/ > Functionally - I think trying to draw a distinction between accessing a site directly and using a tool like an LLM to access a site is a mistake. This isn't "a tool" though, it's cloud hosted scrapers of vc-funded startups taking down small websites in their quest to develop their "tool". It is possible to develop a scraper that doesn't do this, but these companies consciously chose to ignore the pre-existing standards for that. Which is why I think the candy analogy fits perfectly, in fact. | |
| ▲ | account42 2 days ago | parent | prev [-] | | > They're doing exactly what I said - adding PoW (anubis - as you point out - being one solution) to gate access. Which is a shit solution where everyone suffers. > Another great solution? Just ask users to login (no js needed). I'll stand pretty firmly behind "If you aren't willing to make an account - you don't actually care about the site". No I won't create an account to check if a search result has what I'm looking for. Not will I sign up to a forum before I know what the culture is like. We already had this shit with communities moving to Discord, we don't need fuck up the remaining web as well. |
|
|
|
| ▲ | igloopan 2 days ago | parent | prev | next [-] |
| I think you're missing the context that is the article.
The candy in this case is the people who may or may not go to read your e.g. ramen recipe. The real problem, as I see it, is that over time, as LLMs absorb the information covered by that recipe, fewer people will actually look at the search results since the AI summary tells them how to make a good-enough bowl of ramen. The amount of ramen enjoyers is zero-sum. Your recipe will, of course, stay up and accessible to real people but LLMs take away impressions that could have been yours. In regards to this metaphor, they take your candy and put it in their own bowl. |
| |
| ▲ | horsawlarway 2 days ago | parent | next [-] | | So what is the goal behind gathering those impressions? Why do you take this as a problem? And I'm not being glib here - those are genuine questions. If the goal is to share a good ramen recipe... are you not still achieving that? | | |
| ▲ | SamBam 2 days ago | parent [-] | | The internet would not exist if it consisted of people just putting stuff out there, happy that it's released into the wilds of the overall consciousness, and nothing more.
People are willing to put the time and effort into posting stuff for other reasons. Building community, gaining recognition, making money. Even on a website like HN we post under consistent usernames with the vague sense that these words are ours. If posts had no usernames, no one would comment on this site. It's completely disingenuous to say that everyone who creates content -- blog authors, recipe creators, book writers, artists, etc -- should just be happy feeding the global consciousness because then everyone will get a tiny diluted iota of their unattributed wisdom. | | |
| ▲ | horsawlarway 2 days ago | parent [-] | | How old are you? I'm old enough I remember a vivid internet of exactly that. Back when you couldn't make money from ads, and there was no online commerce. Frankly - I think the world might be a much better place if we moved back in that direction a bit. If you're only doing it for money or credit, maybe do something else instead? > If posts had no usernames, no one would comment on this site. I'd still comment. I don't actually give much of a shit about the username attached. I'm here to have a casual conversation and think about things. Not for some bullshit internet street cred. | | |
| ▲ | SamBam 2 days ago | parent | next [-] | | I'm more than old enough to remember the birth of the internet. Back when I had a GeoCities website about aliens (seriously) it was still mine. I had a comments section and I hoped people would comment on it (no one did). I had a counter. I commented on other people's sites in the Area 51 subsection I was listed under. The aim wasn't just to put out my same-ol' unoriginal thoughts into the distributed global consciousness, it was to actually talk to other people. The fact that I wrote it under a dumb handle (a variant of the one I still use everywhere) didn't make me feel less like it was my own individual communication. It's the same for everything else, even the stuff that was completely unattributed. If you put a hilarious animation on YTMND, you know that other people will be referencing that specific one, and linking to it, and saying "did you see that funny thing on YTMND?" It wouldn't have been enough for the audience to just get some diluted, average version of that animation spread out into some global meme-generating AI. So no, "Google Zero" where no one sees the original content and is just "happy that their thoughts are getting out there, somehow" is not something that anyone should wish for. | |
| ▲ | reactordev 2 days ago | parent | prev [-] | | You can’t bring back Compuserve. You both are right however it’s the medium that determines one’s point of view on the matter. If I just want to spread my knowledge to the world - I would post on social media. If I want to curate a special viewership and own my own corner of the web - I would post on a blog. If I wanted to set a flag, setup a shop, and say I’m open for business - I would write an app. The internet is all of these things. We just keep being fed the latter. |
|
|
| |
| ▲ | jasonvorhe 2 days ago | parent | prev | next [-] | | That's also trained behavior due to SEO infested recipe sites filled with advertorials, referral links to expensive kitchen equipment, long form texts about the recipe with the recipe hidden somewhere below that. Same goes for other stuff that can be easily propped up with lengthy text stuffed with just the right terms to spam search indexes with. LLMs are just readability on speed, with the downsides of drugs. | |
| ▲ | 2 days ago | parent | prev [-] | | [deleted] |
|
|
| ▲ | lelanthran 2 days ago | parent | prev [-] |
| > I really don't think this holds. Only if you consider DoS as the only downside. As with this analogy: 1. I put out a bowl of (infinite and cost-free) candy, with my name written on each piece so people know where they got the candy. 2. Some other resident, who doesn't have an infinite and cost-free source of candy like I do, comes along and grabs all the candy at periodic intervals. 3. They then scrub my name from all the candy wrappers and replace it with their name. 4. They put out all the candy, pretending it is their candy. This analogy is much more accurate than either mischaracterisation in this thread: 1. I have no objection to the other resident using me as an unlimited source of candy. 2. I object only to them obfuscating their source of candy, instead misrepresenting the candy as their own! Because, you see, no one cared when search engines directed candy-hunters to your door. No once cared when search engines presented the candy with your name still on it. The whole issue, which is unaddressed by your post, is scrubbing the attribution, and then re-attributing the candy. |