| |
| ▲ | horsawlarway 2 days ago | parent | next [-] | | I really don't think this holds. It's vanishingly rare to end up in a spot where your site is getting enough LLM driven traffic for you to really notice (and I'm not talking out my ass - I host several sites from personal hardware running in my basement). Bots are a thing. Bots have been a thing and will continue to be a thing. They mostly aren't worth worrying about, and at least for now you can throw PoW in front of your site if you are suddenly getting enough traffic from them to care. In the mean time... Your bowl of candy is still there. Still full of your candy for real people to read. That's the fun of digital goods... They aren't "exhaustible" like your candy bowl. No LLM is dumping your whole bowl (they can't). At most - they're just making the line to access it longer. | | |
| ▲ | shiomiru 2 days ago | parent | next [-] | | > They mostly aren't worth worrying about Well, a common pattern I've lately been seeing is: * Website goes down/barely accessible * Webmaster posts "sorry we're down, LLM scrapers are DoSing us" * Website accessible again, but now you need JS-enabled whatever the god of the underworld is testing this week with to access it. (Alternatively, the operator decides it's not worth the trouble and the website shuts down.) So I don't think your experience about LLM scrapers "not mattering" generalizes well. | | |
| ▲ | horsawlarway 2 days ago | parent [-] | | Nah - it generalizes fine. They're doing exactly what I said - adding PoW (anubis - as you point out - being one solution) to gate access. That's hardly different than things like Captchas which were a big thing even before LLMs, and also required javascript. Frankly - I'd much rather have people put Anubis in front of the site than cloudflare, as an aside. If the site really was static before, and no JS was needed - LLM scraping taking it down means it was incredibly misconfigured (an rpi can do thousands of reqs/s for static content, and caching is your friend). --- Another great solution? Just ask users to login (no js needed). I'll stand pretty firmly behind "If you aren't willing to make an account - you don't actually care about the site". My take is that search engines and sites generating revenue through ads are the most impacted. I just don't have all that much sympathy for either. Functionally - I think trying to draw a distinction between accessing a site directly and using a tool like an LLM to access a site is a mistake. Like - this was literally the mission statement of the semantic web: "unleash the computer on your behalf to interact with other computers". It just turns out we got there by letting computers deal with unstructured data, instead of making all the data structured. | | |
| ▲ | krupan 2 days ago | parent | next [-] | | "this was literally the mission statement of the semantic web" which most everyone either ignored or outright rejected, but thanks for forcing it on us anyway? | | |
| ▲ | horsawlarway 2 days ago | parent [-] | | I guess if my options for getting a ramen recipe are - Search for it and randomly click on SEO spam articles all over the place, riddled with ads, scrolling 10,000 lines down to see a generally pretty uninspired recipe or - Use an LLM and get a pretty uninspired recipe I don't really see much difference. And we were already well past the days where I got anything other than the first option using the web. There was a brief window were intentionally searching specific sites like reddit/hn worked, but even that's been gone for a couple years now. The best recipe is going to be the one you get from your friends/family/neighbors anyways. And at least on the LLM side - I can run it locally and peg it to a version without ads. | | |
| ▲ | w00ds 2 days ago | parent [-] | | It's crazy how appealing the irl version you mentioned is, compared to the online version. Looking through a book, meeting people and sharing recipes, etc. The world you're interacting with actually cares about you.
Feels like the net can't ever have that now. |
|
| |
| ▲ | shiomiru 2 days ago | parent | prev | next [-] | | > If the site really was static before, and no JS was needed One does not imply the other. This forum is one example. (Or rather, hn.js is entirely optional.) > Another great solution? Just ask users to login (no js needed). I'll stand pretty firmly behind "If you aren't willing to make an account - you don't actually care about the site". Accounts don't make sense for all websites. Self-hosted git repositories are one common case where I now have to wait seconds for my phone to burn through enough sha256 to see a readme - but surely you don't want to gate that behind a login either... > My take is that search engines and sites generating revenue through ads are the most impacted. I just don't have all that much sympathy for either. ...and hobbyist services. If we're sticking with Anubis as an example, consider the author's motivation for developing it: > A majority of the AI scrapers are not well-behaved, and they will ignore your robots.txt, ignore your User-Agent blocks, and ignore your X-Robots-Tag headers. They will scrape your site until it falls over, and then they will scrape it some more. They will click every link on every link on every link viewing the same pages over and over and over and over. Some of them will even click on the same link multiple times in the same second. It's madness and unsustainable. https://xeiaso.net/blog/2025/anubis/ > Functionally - I think trying to draw a distinction between accessing a site directly and using a tool like an LLM to access a site is a mistake. This isn't "a tool" though, it's cloud hosted scrapers of vc-funded startups taking down small websites in their quest to develop their "tool". It is possible to develop a scraper that doesn't do this, but these companies consciously chose to ignore the pre-existing standards for that. Which is why I think the candy analogy fits perfectly, in fact. | |
| ▲ | account42 2 days ago | parent | prev [-] | | > They're doing exactly what I said - adding PoW (anubis - as you point out - being one solution) to gate access. Which is a shit solution where everyone suffers. > Another great solution? Just ask users to login (no js needed). I'll stand pretty firmly behind "If you aren't willing to make an account - you don't actually care about the site". No I won't create an account to check if a search result has what I'm looking for. Not will I sign up to a forum before I know what the culture is like. We already had this shit with communities moving to Discord, we don't need fuck up the remaining web as well. |
|
| |
| ▲ | igloopan 2 days ago | parent | prev | next [-] | | I think you're missing the context that is the article.
The candy in this case is the people who may or may not go to read your e.g. ramen recipe. The real problem, as I see it, is that over time, as LLMs absorb the information covered by that recipe, fewer people will actually look at the search results since the AI summary tells them how to make a good-enough bowl of ramen. The amount of ramen enjoyers is zero-sum. Your recipe will, of course, stay up and accessible to real people but LLMs take away impressions that could have been yours. In regards to this metaphor, they take your candy and put it in their own bowl. | | |
| ▲ | horsawlarway 2 days ago | parent | next [-] | | So what is the goal behind gathering those impressions? Why do you take this as a problem? And I'm not being glib here - those are genuine questions. If the goal is to share a good ramen recipe... are you not still achieving that? | | |
| ▲ | SamBam 2 days ago | parent [-] | | The internet would not exist if it consisted of people just putting stuff out there, happy that it's released into the wilds of the overall consciousness, and nothing more.
People are willing to put the time and effort into posting stuff for other reasons. Building community, gaining recognition, making money. Even on a website like HN we post under consistent usernames with the vague sense that these words are ours. If posts had no usernames, no one would comment on this site. It's completely disingenuous to say that everyone who creates content -- blog authors, recipe creators, book writers, artists, etc -- should just be happy feeding the global consciousness because then everyone will get a tiny diluted iota of their unattributed wisdom. | | |
| ▲ | horsawlarway 2 days ago | parent [-] | | How old are you? I'm old enough I remember a vivid internet of exactly that. Back when you couldn't make money from ads, and there was no online commerce. Frankly - I think the world might be a much better place if we moved back in that direction a bit. If you're only doing it for money or credit, maybe do something else instead? > If posts had no usernames, no one would comment on this site. I'd still comment. I don't actually give much of a shit about the username attached. I'm here to have a casual conversation and think about things. Not for some bullshit internet street cred. | | |
| ▲ | SamBam 2 days ago | parent | next [-] | | I'm more than old enough to remember the birth of the internet. Back when I had a GeoCities website about aliens (seriously) it was still mine. I had a comments section and I hoped people would comment on it (no one did). I had a counter. I commented on other people's sites in the Area 51 subsection I was listed under. The aim wasn't just to put out my same-ol' unoriginal thoughts into the distributed global consciousness, it was to actually talk to other people. The fact that I wrote it under a dumb handle (a variant of the one I still use everywhere) didn't make me feel less like it was my own individual communication. It's the same for everything else, even the stuff that was completely unattributed. If you put a hilarious animation on YTMND, you know that other people will be referencing that specific one, and linking to it, and saying "did you see that funny thing on YTMND?" It wouldn't have been enough for the audience to just get some diluted, average version of that animation spread out into some global meme-generating AI. So no, "Google Zero" where no one sees the original content and is just "happy that their thoughts are getting out there, somehow" is not something that anyone should wish for. | |
| ▲ | reactordev 2 days ago | parent | prev [-] | | You can’t bring back Compuserve. You both are right however it’s the medium that determines one’s point of view on the matter. If I just want to spread my knowledge to the world - I would post on social media. If I want to curate a special viewership and own my own corner of the web - I would post on a blog. If I wanted to set a flag, setup a shop, and say I’m open for business - I would write an app. The internet is all of these things. We just keep being fed the latter. |
|
|
| |
| ▲ | jasonvorhe 2 days ago | parent | prev | next [-] | | That's also trained behavior due to SEO infested recipe sites filled with advertorials, referral links to expensive kitchen equipment, long form texts about the recipe with the recipe hidden somewhere below that. Same goes for other stuff that can be easily propped up with lengthy text stuffed with just the right terms to spam search indexes with. LLMs are just readability on speed, with the downsides of drugs. | |
| ▲ | 2 days ago | parent | prev [-] | | [deleted] |
| |
| ▲ | lelanthran 2 days ago | parent | prev [-] | | > I really don't think this holds. Only if you consider DoS as the only downside. As with this analogy: 1. I put out a bowl of (infinite and cost-free) candy, with my name written on each piece so people know where they got the candy. 2. Some other resident, who doesn't have an infinite and cost-free source of candy like I do, comes along and grabs all the candy at periodic intervals. 3. They then scrub my name from all the candy wrappers and replace it with their name. 4. They put out all the candy, pretending it is their candy. This analogy is much more accurate than either mischaracterisation in this thread: 1. I have no objection to the other resident using me as an unlimited source of candy. 2. I object only to them obfuscating their source of candy, instead misrepresenting the candy as their own! Because, you see, no one cared when search engines directed candy-hunters to your door. No once cared when search engines presented the candy with your name still on it. The whole issue, which is unaddressed by your post, is scrubbing the attribution, and then re-attributing the candy. |
| |
| ▲ | lblume 2 days ago | parent | prev | next [-] | | > these companies are the equivalent of the asshole that dumps the whole bowl into their bag In most cases, they aren't? You can still access a website that is being crawled for the purpose of training LLMs. Sure, DOS exists, but seems to not be as much of a problem as to cause widespread outage of websites. | | |
| ▲ | rangerelf 2 days ago | parent [-] | | A better analogy is that LLM crawlers are candy store workers going through the houses grabbing free candy and then selling it in their own shop. Scalpers. Knowledge scalpers. | | |
| ▲ | horsawlarway 2 days ago | parent [-] | | Except nothing is actually taken. It's copied. If your goal in publishing the site is to drive eyeballs to it for ad revenue... then you probably care. If your goal in publishing the site is just to let people know a thing you found or learned... that goal is still getting accomplished. For me... I'm not in it for the fame or money, I'm fine with it. | | |
| ▲ | allturtles 2 days ago | parent | next [-] | | I think you're missing a middle ground, of people who want to let people know a thing they found or learned, and want to get credit for it. Among other things, this motivation has been the basis for pretty much the entire scientific enterprise since it started: > But that which will excite the greatest astonishment by far, and which indeed especially moved me to call the attention of all astronomers and philosophers, is this, namely, that I have discovered four planets, neither known nor observed by any one of the astronomers before my time, which have their orbits round a certain bright star, one of those previously known, like Venus and Mercury round the Sun, and are sometimes in front of it, sometimes behind it, though they never depart from it beyond certain limits. [0] [0]: https://www.gutenberg.org/cache/epub/46036/pg46036-images.ht... | |
| ▲ | bbarnett 2 days ago | parent | prev | next [-] | | It's a very simple metric. They had nothing of value, no product, no marketable thing. Then they scanned your site. They had to, along with others. And in scanning your site, they scanned the results of your work, effort, and cost. Now they have a product. I need to be clear here, if that site has no value, why do they want it? Understand, these aren't private citizens. A private citizen might print out a recipe, who cares? They might even share that with friends. OK. But if they take it, then package it, then make money? That is different. In my country, copyright doesn't really punish a person. No one gets hit for copying movies even. It does punish someone, for example, copying and then reselling that work though. This sort of thing should depend on who's doing it. Their motive. When search engines were operating an index, nothing was lost. In fact, it was a mutually symbiotic relationship. I guess what we should really ask, is why on Earth should anyone produce anything, if the end result is not one sees it? And instead, they just read a summary from an AI? No more website, no new data, means no new AI knowledge too. | | |
| ▲ | horsawlarway 2 days ago | parent | next [-] | | I guess I don't derive my personal value from the esteem of others. And I don't mean that as an insult, because I get that different people do things for different reasons, and we all get our dopamine hits in different ways. I just think that if the only reason you choose to do something is because you think it's going to get attention on the internet... Then you probably shouldn't be doing that thing in the first place. I produce things because I enjoy producing them. I share them with my friends and family (both in person and online). That's plenty. Historically... that's the norm. > I guess what we should really ask, is why on Earth should anyone produce anything, if the end result is not one sees it? This is a really rather disturbing view of the world. Do things for you. I make things because I see it. My family sees it. My friends see it. I grow roses for me and my neighbors - not for some random internet credit. I plant trees so my kids can sit under them - not for some random internet credit. | | |
| ▲ | bbarnett 2 days ago | parent | next [-] | | Context. Note that we're having a discussion about people putting up websites, and being upset about AI snarfing that content. > I guess what we should really ask, is why on Earth should anyone produce anything, if the end result is not one sees it? > > And instead, they just read a summary from an AI? The above is referring to that context. To people wanting others to see things, and that after all is what this whole website's, this person's concerns are about. So now that this is reiterated, in the context of someone wanting to show things to the world, why would they produce -- if their goal is lost? This doesn't mean they don't do things privately for their friends and family. This isn't a binary, 0/1 solution. Just because you have a website for "all those other people" to see, doesn't mean you don't share things between your friends and family. So what you seem to dislike, is that anyone does it at all. Because again, people writing for eyeballs at large, doesn't mean they aren't separately for their friends or family. It seems to me that you're also creating a schism between "family / friends" and "all those other people". Naturally you care for those close to you, but "those other people" are people too. And some people just see people as... people. People to share things with. Yet you seem to be making that a nasty, dirty thing. | | |
| ▲ | horsawlarway 2 days ago | parent [-] | | And the content is still there for those people. The only folks who miss it are the ones who choose to use an llm instead of looking for something different. I guess my opinion is that you can't "make the horse drink". So instead focus on the groups that care enough to go find your content. Those people still exist. If the only joy you got was "the number of people who look at me!"... Then yes, that number is probably going to go down. But I also really do think that's a generally bad reason to be doing an activity. Again, personalities vary, and I won't deny people (pretty much all of us) crave that type of attention in some form or another. I just think, socially speaking, we're better off with less of that right now. |
| |
| ▲ | Anamon a day ago | parent | prev [-] | | You conflate doing something with sharing it online. A lot of people do things for themselves, then they post about it and share it because they like the idea of someone else enjoying and getting something out of it. The thing LLMs might get them to stop doing is not the doing of the thing, but the sharing, to the detriment of everyone who actually would have liked to see it. And no, people sticking to the LLM summary won't get the ideas I shared. They get a crappy, broken, incoherent, messed-up, bland, averaged version of it. Purified of all the personality, insight and thought it might have had in it. That's why people getting an LLM summary partially derived from their data will never seem like a suitable replacement to someone who does it not for the views or credits, but because they actually want to share something of themselves. I do agree that the solution would best come from the demand site. People realising the inherent blandness and horseshitness of LLM replies, especially when compared to something written by an actual human with thought and intent, ditch the low-quality LLM turds and demand real content again. The problem I see right now is that pretty much everyone would prefer the human version to the slop, but the megacorps force-feed the slop and spend billions trying to make it as inconvenient as possible to interact with other humans. |
| |
| ▲ | shkkmo 2 days ago | parent | prev [-] | | > But if they take it, then package it, then make money? That is different But still, also legal. You can't copyright a recipe itself, just the fluff around it. It is totally legal for somone to visit a bunch of recipe blogs, copy the recipes, rewrite the descriptions and detailed instructions and then publish that in a book. The is essentially the same as what LLMs do. So prohibiting this would be a dramatic expansion of the power of copyright. Personally, I don't use LLMs. I hope there will always be people like me that want to see the original source and verify any knowledge. I'm actually hopeful that LLM reduction in search traffic will impact the profitability of SEO clickbait referral link garbage sites that now dominate results on many searches. We'll be left with enthusiasts producing content for the joy of nerding out again. Those sites will still have a following of actually interested people and the rest can consume the soulless summaries from the eventually ad infested LLMs. | | |
| ▲ | bbarnett 2 days ago | parent [-] | | It may be legal in your jurisdiction, but I think this is a more generic conversation that the specific work class being copied. And further, my point is also that other parts of copyright law, at least where I live, view "for profit copying" and "some dude wanting to print out a webpage" entirely different. I feel it makes sense. Amusingly, I feel that an ironic twist would be a judgement that all currently trained LLMs, would be unusable for commercial use. | | |
| ▲ | shkkmo 2 days ago | parent [-] | | > other parts of copyright law, at least where I live, view "for profit copying" and "some dude wanting to print out a webpage" entirely different. I don't know what your jurisdiction is however through treaties, much of how USA copyright law works has been exported to many other countries so it is a reasonable place to base discussion. In the USA commercial vs. non-commercial is not sufficent to determine if copying violates copyright law. It is one of several factors that is used to determine "fair use" and while it definitely helps, non-commerical use can easily infringe (torrents) and commercial use can be fine (telephone book white pages). > a judgement that all currently trained LLMs, would be unusable for commercial use I sure hope not. I don't like or use LLMs but I also don't like copyright law and I hate to see it receive such an expansion of power. | | |
| ▲ | bbarnett 2 days ago | parent [-] | | > much of how USA copyright law works has been exported to many other countries I'm not blaming you for bringing it up, however I did make it clear that I was speaking of a different jurisdiction. And yes, of course you're right, it's always a "big deal" when trade negotiations come up. Canada has multiple different things in play to protect the individual. The non-profiting dude. Fair use is one, far expanded. Notice-and-notice is another, which currently means you have to pay to send an 'infringed' notice to people, as a copyright owner. Damages are also capped, at an amount that makes legal action untenable for most. And the bar of proof is significantly higher. And that's for torrents. For years we've had things like "you pay a tiny tax on hard drives", but then "that means you've already paid for anything you'll ever copy" and the tax goes into a fund to pay Canadian artists. While this may seem strange, it's one solution we've had to help keep art alive, but also not punish the average citizen with crazy law suits, and insane attacks from massive law firms. Essentially, we don't let the US bully us into agreements which are massively harmful to our citizens. But back to the LLM side. I see the current situation a weakening of copyright law, a massive one. And not for the average joe, but instead for the most commercial of entities. I want copyright law, in some circumstances, to be weakened for people. Not companies. They get to pay artists. Creators. Developers. And of course, there'd be no GPL without copyright law. So while I agree for individuals, especially in the US, copyright law is very annoying and a problem? Let's again focus on what I'm saying. It currently isn't and doesn't have to be an absolutely You can and we already have, as we've both discussed, different outcomes for copyright. EG both for fair use and breach outcomes, for corporations/for-profit and just some person. So let's stop talking about copyright stronger/weaker as a generic, and a specific. I support weaker outcomes of breach, and enhanced fair use for people. I support stronger outcomes of breach, and so forth for companies. Further, I support sliding scales too. A one person youtuber isn't the same as a 10B company. A person playing parts of one song in their video for a few seconds, as a one person corp, isn't the same as an entity scanning all of humankind's knowledge and laughing in our faces. Huge differences of scale and scope. Look at it this way. Some of these companies have downloaded torrents. If a person did what they did, they'd receive billions in fines!! Yet they're getting a lesser outcome, as in freaking nothing. It's the wrong place for copyright weakening. | | |
| ▲ | shkkmo 2 days ago | parent [-] | | > I see the current situation a weakening of copyright law, a massive one. And not for the average joe, but instead for the most commercial of entities. You gonna have to explain this in more detail because it isn't clear to me how you justify this claim. What exactly is being weakened? In what way? > Some of these companies have downloaded torrents. If a person did what they did, they'd receive billions in fines!! The one I am assuming you are referring to is Meta, and they are getting sued. They arguably should also be facing criminal charges too under current law. > Yet they're getting a lesser outcome, as in freaking nothing. That court case hasn't finished and that doesn't have anything directly to do with LLMs but with our legal system and power/wealth imbalances. > And of course, there'd be no GPL without copyright law. I personally strongly prefer MIT to GPL. GPL sort of makes sense as a reaction to copyright law but I don't think GPL justifies the existence or state of copyright law. > Further, I support sliding scales too. What does that mean? Just the fines / judgements because along with having to pay, the activity itself must be stopped. If copyright only prohibited larger entities from copying, it would be less onerous and would make copyright more tolerable, but I don't think that would solve the AI training issue in any way and seems like a tangent. > an entity scanning all of humankind's knowledge and laughing in our faces. Knowledge is not copyrightable. If you want to stop this, expanding the power of copyright to make learning/knowing something an infinging activity is one of the worst possible ways to go about it. | | |
| ▲ | rangerelf a day ago | parent [-] | | > The one I am assuming you are referring to is Meta, and they are getting sued. They arguably should also be facing criminal charges too under current law. I think your assumption is falling too short, it's not just Meta, it's OpenAI, it's Anthropic, it's Google, and Microsoft, and others. Like you said, the court case hasn't finished, but there's meddling from the Whitehouse already; I really doubt there's going to be any fair play in this case. |
|
|
|
|
|
| |
| ▲ | lelanthran 2 days ago | parent | prev | next [-] | | > If your goal in publishing the site is just to let people know a thing you found or learned... that goal is still getting accomplished. I like how you posted so many times in this thread, with the assertion that that is the goal of people giving away stuff for free. Your responses in this thread are almost textbook example of Strawman Argument; you could not do a better Strawman Argument even if you tried! | |
| ▲ | CJefferson 2 days ago | parent | prev [-] | | It's absolutely fine for you to be fine with it. What is nonsense is how copyright laws have been so strict, and suddenly AI companies can just ignore everyone's wishes. | | |
| ▲ | horsawlarway 2 days ago | parent [-] | | Hey - no argument here. I don't think the concept of copyright itself is fundamentally immoral... but it's pretty clearly a moral hazard, and the current implementation is both terrible at supporting independent artists, and a beat stick for already wealthy corporations and publishers to use to continue shitting on independent creators. So sure - I agree that watching the complete disregard for copyright is galling in its hypocrisy, but the problem is modern copyright, IMO. ...and maybe also capitalism in general and wealth inequality at large - but that's a broader, complicated, discussion. |
|
|
|
| |
| ▲ | reactordev 2 days ago | parent | prev [-] | | More like when the project kids show up in the millionaire neighborhood because they know they’ll get full size candy bars. It’s not that there’s none for the others. It’s that there was this unspoken agreement, reinforced by the last 20 years, that website content is protected speech, protected intellectual property, and is copyrightable to its owner/author. Now, that trust and good faith is broken. | | |
| ▲ | account42 2 days ago | parent [-] | | A yes of course, the poor poor AI companies getting scraps from the greedy independent website operators. |
|
|