Remix.run Logo
lblume 2 days ago

> these companies are the equivalent of the asshole that dumps the whole bowl into their bag

In most cases, they aren't? You can still access a website that is being crawled for the purpose of training LLMs. Sure, DOS exists, but seems to not be as much of a problem as to cause widespread outage of websites.

rangerelf 2 days ago | parent [-]

A better analogy is that LLM crawlers are candy store workers going through the houses grabbing free candy and then selling it in their own shop.

Scalpers. Knowledge scalpers.

horsawlarway 2 days ago | parent [-]

Except nothing is actually taken.

It's copied.

If your goal in publishing the site is to drive eyeballs to it for ad revenue... then you probably care.

If your goal in publishing the site is just to let people know a thing you found or learned... that goal is still getting accomplished.

For me... I'm not in it for the fame or money, I'm fine with it.

allturtles 2 days ago | parent | next [-]

I think you're missing a middle ground, of people who want to let people know a thing they found or learned, and want to get credit for it.

Among other things, this motivation has been the basis for pretty much the entire scientific enterprise since it started:

> But that which will excite the greatest astonishment by far, and which indeed especially moved me to call the attention of all astronomers and philosophers, is this, namely, that I have discovered four planets, neither known nor observed by any one of the astronomers before my time, which have their orbits round a certain bright star, one of those previously known, like Venus and Mercury round the Sun, and are sometimes in front of it, sometimes behind it, though they never depart from it beyond certain limits. [0]

[0]: https://www.gutenberg.org/cache/epub/46036/pg46036-images.ht...

bbarnett 2 days ago | parent | prev | next [-]

It's a very simple metric. They had nothing of value, no product, no marketable thing.

Then they scanned your site. They had to, along with others. And in scanning your site, they scanned the results of your work, effort, and cost.

Now they have a product.

I need to be clear here, if that site has no value, why do they want it?

Understand, these aren't private citizens. A private citizen might print out a recipe, who cares? They might even share that with friends. OK.

But if they take it, then package it, then make money? That is different.

In my country, copyright doesn't really punish a person. No one gets hit for copying movies even. It does punish someone, for example, copying and then reselling that work though.

This sort of thing should depend on who's doing it. Their motive.

When search engines were operating an index, nothing was lost. In fact, it was a mutually symbiotic relationship.

I guess what we should really ask, is why on Earth should anyone produce anything, if the end result is not one sees it?

And instead, they just read a summary from an AI?

No more website, no new data, means no new AI knowledge too.

horsawlarway 2 days ago | parent | next [-]

I guess I don't derive my personal value from the esteem of others.

And I don't mean that as an insult, because I get that different people do things for different reasons, and we all get our dopamine hits in different ways.

I just think that if the only reason you choose to do something is because you think it's going to get attention on the internet... Then you probably shouldn't be doing that thing in the first place.

I produce things because I enjoy producing them. I share them with my friends and family (both in person and online). That's plenty. Historically... that's the norm.

> I guess what we should really ask, is why on Earth should anyone produce anything, if the end result is not one sees it?

This is a really rather disturbing view of the world. Do things for you. I make things because I see it. My family sees it. My friends see it.

I grow roses for me and my neighbors - not for some random internet credit.

I plant trees so my kids can sit under them - not for some random internet credit.

bbarnett 2 days ago | parent | next [-]

Context. Note that we're having a discussion about people putting up websites, and being upset about AI snarfing that content.

> I guess what we should really ask, is why on Earth should anyone produce anything, if the end result is not one sees it?

>

> And instead, they just read a summary from an AI?

The above is referring to that context. To people wanting others to see things, and that after all is what this whole website's, this person's concerns are about.

So now that this is reiterated, in the context of someone wanting to show things to the world, why would they produce -- if their goal is lost?

This doesn't mean they don't do things privately for their friends and family. This isn't a binary, 0/1 solution. Just because you have a website for "all those other people" to see, doesn't mean you don't share things between your friends and family.

So what you seem to dislike, is that anyone does it at all. Because again, people writing for eyeballs at large, doesn't mean they aren't separately for their friends or family.

It seems to me that you're also creating a schism between "family / friends" and "all those other people". Naturally you care for those close to you, but "those other people" are people too.

And some people just see people as... people. People to share things with.

Yet you seem to be making that a nasty, dirty thing.

horsawlarway 2 days ago | parent [-]

And the content is still there for those people.

The only folks who miss it are the ones who choose to use an llm instead of looking for something different.

I guess my opinion is that you can't "make the horse drink". So instead focus on the groups that care enough to go find your content.

Those people still exist.

If the only joy you got was "the number of people who look at me!"... Then yes, that number is probably going to go down. But I also really do think that's a generally bad reason to be doing an activity.

Again, personalities vary, and I won't deny people (pretty much all of us) crave that type of attention in some form or another. I just think, socially speaking, we're better off with less of that right now.

Anamon a day ago | parent | prev [-]

You conflate doing something with sharing it online. A lot of people do things for themselves, then they post about it and share it because they like the idea of someone else enjoying and getting something out of it. The thing LLMs might get them to stop doing is not the doing of the thing, but the sharing, to the detriment of everyone who actually would have liked to see it.

And no, people sticking to the LLM summary won't get the ideas I shared. They get a crappy, broken, incoherent, messed-up, bland, averaged version of it. Purified of all the personality, insight and thought it might have had in it. That's why people getting an LLM summary partially derived from their data will never seem like a suitable replacement to someone who does it not for the views or credits, but because they actually want to share something of themselves.

I do agree that the solution would best come from the demand site. People realising the inherent blandness and horseshitness of LLM replies, especially when compared to something written by an actual human with thought and intent, ditch the low-quality LLM turds and demand real content again. The problem I see right now is that pretty much everyone would prefer the human version to the slop, but the megacorps force-feed the slop and spend billions trying to make it as inconvenient as possible to interact with other humans.

shkkmo 2 days ago | parent | prev [-]

> But if they take it, then package it, then make money? That is different

But still, also legal.

You can't copyright a recipe itself, just the fluff around it. It is totally legal for somone to visit a bunch of recipe blogs, copy the recipes, rewrite the descriptions and detailed instructions and then publish that in a book.

The is essentially the same as what LLMs do. So prohibiting this would be a dramatic expansion of the power of copyright.

Personally, I don't use LLMs. I hope there will always be people like me that want to see the original source and verify any knowledge.

I'm actually hopeful that LLM reduction in search traffic will impact the profitability of SEO clickbait referral link garbage sites that now dominate results on many searches. We'll be left with enthusiasts producing content for the joy of nerding out again. Those sites will still have a following of actually interested people and the rest can consume the soulless summaries from the eventually ad infested LLMs.

bbarnett 2 days ago | parent [-]

It may be legal in your jurisdiction, but I think this is a more generic conversation that the specific work class being copied. And further, my point is also that other parts of copyright law, at least where I live, view "for profit copying" and "some dude wanting to print out a webpage" entirely different.

I feel it makes sense.

Amusingly, I feel that an ironic twist would be a judgement that all currently trained LLMs, would be unusable for commercial use.

shkkmo 2 days ago | parent [-]

> other parts of copyright law, at least where I live, view "for profit copying" and "some dude wanting to print out a webpage" entirely different.

I don't know what your jurisdiction is however through treaties, much of how USA copyright law works has been exported to many other countries so it is a reasonable place to base discussion.

In the USA commercial vs. non-commercial is not sufficent to determine if copying violates copyright law. It is one of several factors that is used to determine "fair use" and while it definitely helps, non-commerical use can easily infringe (torrents) and commercial use can be fine (telephone book white pages).

> a judgement that all currently trained LLMs, would be unusable for commercial use

I sure hope not. I don't like or use LLMs but I also don't like copyright law and I hate to see it receive such an expansion of power.

bbarnett 2 days ago | parent [-]

> much of how USA copyright law works has been exported to many other countries

I'm not blaming you for bringing it up, however I did make it clear that I was speaking of a different jurisdiction. And yes, of course you're right, it's always a "big deal" when trade negotiations come up.

Canada has multiple different things in play to protect the individual. The non-profiting dude. Fair use is one, far expanded. Notice-and-notice is another, which currently means you have to pay to send an 'infringed' notice to people, as a copyright owner. Damages are also capped, at an amount that makes legal action untenable for most. And the bar of proof is significantly higher.

And that's for torrents.

For years we've had things like "you pay a tiny tax on hard drives", but then "that means you've already paid for anything you'll ever copy" and the tax goes into a fund to pay Canadian artists. While this may seem strange, it's one solution we've had to help keep art alive, but also not punish the average citizen with crazy law suits, and insane attacks from massive law firms.

Essentially, we don't let the US bully us into agreements which are massively harmful to our citizens.

But back to the LLM side. I see the current situation a weakening of copyright law, a massive one. And not for the average joe, but instead for the most commercial of entities.

I want copyright law, in some circumstances, to be weakened for people. Not companies. They get to pay artists. Creators. Developers.

And of course, there'd be no GPL without copyright law. So while I agree for individuals, especially in the US, copyright law is very annoying and a problem? Let's again focus on what I'm saying.

It currently isn't and doesn't have to be an absolutely

You can and we already have, as we've both discussed, different outcomes for copyright. EG both for fair use and breach outcomes, for corporations/for-profit and just some person. So let's stop talking about copyright stronger/weaker as a generic, and a specific.

I support weaker outcomes of breach, and enhanced fair use for people.

I support stronger outcomes of breach, and so forth for companies.

Further, I support sliding scales too. A one person youtuber isn't the same as a 10B company. A person playing parts of one song in their video for a few seconds, as a one person corp, isn't the same as an entity scanning all of humankind's knowledge and laughing in our faces.

Huge differences of scale and scope.

Look at it this way. Some of these companies have downloaded torrents. If a person did what they did, they'd receive billions in fines!!

Yet they're getting a lesser outcome, as in freaking nothing.

It's the wrong place for copyright weakening.

shkkmo 2 days ago | parent [-]

> I see the current situation a weakening of copyright law, a massive one. And not for the average joe, but instead for the most commercial of entities.

You gonna have to explain this in more detail because it isn't clear to me how you justify this claim. What exactly is being weakened? In what way?

> Some of these companies have downloaded torrents. If a person did what they did, they'd receive billions in fines!!

The one I am assuming you are referring to is Meta, and they are getting sued. They arguably should also be facing criminal charges too under current law.

> Yet they're getting a lesser outcome, as in freaking nothing.

That court case hasn't finished and that doesn't have anything directly to do with LLMs but with our legal system and power/wealth imbalances.

> And of course, there'd be no GPL without copyright law.

I personally strongly prefer MIT to GPL. GPL sort of makes sense as a reaction to copyright law but I don't think GPL justifies the existence or state of copyright law.

> Further, I support sliding scales too.

What does that mean? Just the fines / judgements because along with having to pay, the activity itself must be stopped.

If copyright only prohibited larger entities from copying, it would be less onerous and would make copyright more tolerable, but I don't think that would solve the AI training issue in any way and seems like a tangent.

> an entity scanning all of humankind's knowledge and laughing in our faces.

Knowledge is not copyrightable. If you want to stop this, expanding the power of copyright to make learning/knowing something an infinging activity is one of the worst possible ways to go about it.

rangerelf a day ago | parent [-]

> The one I am assuming you are referring to is Meta, and they are getting sued. They arguably should also be facing criminal charges too under current law.

I think your assumption is falling too short, it's not just Meta, it's OpenAI, it's Anthropic, it's Google, and Microsoft, and others.

Like you said, the court case hasn't finished, but there's meddling from the Whitehouse already; I really doubt there's going to be any fair play in this case.

lelanthran 2 days ago | parent | prev | next [-]

> If your goal in publishing the site is just to let people know a thing you found or learned... that goal is still getting accomplished.

I like how you posted so many times in this thread, with the assertion that that is the goal of people giving away stuff for free.

Your responses in this thread are almost textbook example of Strawman Argument; you could not do a better Strawman Argument even if you tried!

CJefferson 2 days ago | parent | prev [-]

It's absolutely fine for you to be fine with it. What is nonsense is how copyright laws have been so strict, and suddenly AI companies can just ignore everyone's wishes.

horsawlarway 2 days ago | parent [-]

Hey - no argument here.

I don't think the concept of copyright itself is fundamentally immoral... but it's pretty clearly a moral hazard, and the current implementation is both terrible at supporting independent artists, and a beat stick for already wealthy corporations and publishers to use to continue shitting on independent creators.

So sure - I agree that watching the complete disregard for copyright is galling in its hypocrisy, but the problem is modern copyright, IMO.

...and maybe also capitalism in general and wealth inequality at large - but that's a broader, complicated, discussion.