Remix.run Logo
martin-t 3 days ago

This shouldn't be enforced through technology but the law.

LLM and other "genAI" (really "generative machine statistics") algorithms just take other people's work, mix it so that any individual training input is unrecognizable and resell it back to them. If there is any benefit to society from LLM and other A"I" algorithms, then most of the work _by orders of magnitude_ was done by the people whose data is being stolen and trained on.

If you train on copyrighted data, the model and its output should be copyrighted under the same license. It's plagiarism and it should be copyright infringement.

stahorn 2 days ago | parent | next [-]

It's like the world turned upside down in the last 20 years. I used to pirate everything as a teenager, and I found it silly that copy right would follow along no matter how anything was encoded. If I XORed copyright material A with open source material B, I would get a strange file C that together with B, I could use to get material A again. Why would it be illegal for me to send anybody B and C, where the strange file C might just as well be thought of as containing the open source material B?!

Now when I've grown up, starting paying for what I want, and seeing the need for some way of content creators to get payed for their work, these AI companies pop up. They encode content into a completely new way and then in some way we should just accept that it's fine this time.

This page was posed here on hacker news a few months ago, and it really shows that this is just what's going on:

https://theaiunderwriter.substack.com/p/an-image-of-an-arche...

Maybe another 10 years and we'll be in the spot when these things are considered illegal again?

martin-t 2 days ago | parent | next [-]

I went through exactly this process.

Then I discovered (A)GPL and realized that the system makes sense to protect user rights.

And as I started making my own money, I started paying instead of pirating, though I sometimes wonder how much of my money goes to the actual artists and creators and how much goes to zero-sum occupations like marketing and management.

---

It comes down to understanding power differentials - we need laws so large numbers of individuals each with little power can defend themselves against a small number of individuals with large amounts of power.

(Well, we can defend ourselves anyway but it would be illegal and many would see it as an overreaction - as long as they steal only a little from each of us, we're each supposed to only be a little angry.)

---

> Maybe another 10 years and we'll be in the spot when these things are considered illegal again?

That's my hope too. But it requires many people to understand they're being stolen from and my fear is way too few produce "content"[0] and that the majority will feel like they benefit from being able to imitate us with little effort. There's also this angle that US needs to beat China (even though two nuclear superpowers both lose in an open conflict) and because China has been stealing everything for decades, we (the west) need to start stealing to keep up too.

[0]: https://eev.ee/blog/2025/07/03/the-rise-of-whatever/#:~:text...

lawlessone 2 days ago | parent | prev [-]

just pirate again. It's the only way to ensure a game or movie can't be recalled by publishers the next time they want everyone to buy the sequel.

reactordev 2 days ago | parent [-]

Or traded to a different streaming service you aren’t subscribed to - ugh!

thewebguyd 2 days ago | parent | prev | next [-]

> and resell it back to them.

This is the part I take issue with the most with this tech. Outside of open weight models (and even then, it's not fully open source - the training data is not available, we cannot reproduce the model ourselves), all the LLM companies are doing is stealing and selling our (humans, collectively) knowledge back to us. It's yet another large scale, massive transfer of wealth.

These aren't being made for the good of humanity, to be given freely, they are being made for profit, treating human knowledge and some raw material to be mined and resold at massive scale.

martin-t 2 days ago | parent [-]

And that's just one part of it.

Part 2 is all the copyleft code powering the world. Now it can be effortlessly laundered. The freedom to inspect and modify? Gone.

Part 3 is what happens if actual AI is created. Rich people (who usually perform zero- or negative- sum work, if any) need the masses (who perform positive-sum work) for a technological civilization to actually function. So we have a log of bargaining power.

Then an ultra rich narcissistic billionaire comes along and wants to replace everyone with robots. We're still far off from that even if actual AI is achieved but the result is not that everyone can live a happy post-scarcity life with equality, blackjack and hookers. The result is that we all become beggars dependent on what those benevolent owners of AI and robots hand out to us because we will no longer have anything valuable to provide (besides our bodies I guess).

cowboylowrez a day ago | parent [-]

makes me happy to read that at least folks are thinking about this stuff. to me, this llm replacing humans stuff is ridiculous because we really do have a pretty good supply of humans, whereas do we really have a pretty good supply of resources that go into all of these human replacing ai's?

jasonvorhe 2 days ago | parent | prev | next [-]

Which law? Which jurisdiction? From the same class of people who have been writing laws in their favor for a few centuries already? Pass. Let them consume it all. I'll rather choose the gwern approach and write stuff that's unlikely to get filtered out in upcoming models during training. Anubis treats me like a machine, just like Cloudflare but open source and erroneously in good spirit.

riazrizvi 2 days ago | parent | prev | next [-]

Laws have to be enforceable. When a technology comes along that breaks enforceability, the law/society changes. See also prohibition vs expansion of homebrewing 20’s/30’s, censorship vs expansion of media production 60’s/70’s, encryption bans vs open source movement 90’s, music sampling markets vs music electronics 80’s/90’s…

throw10920 2 days ago | parent | next [-]

> Laws have to be enforceable.

This is a good point. In this case, it does seem pretty easy to enforce, though - just require anyone hosting an LLM for others to use to have full provenance of all of the data that they trained that LLM on. Wouldn't that solve the problem fairly easily? It's not like LLM training can be done in your garage (at which point this requirement would kill off hundreds/thousands of small LLM-training businesses that would hypothetically otherwise exist).

martin-t 2 days ago | parent | prev [-]

In most of those cases, it was because too many people broke the laws, regardless of what companies did. It was too distributed.

But to train a model, you need a huge amount of compute, centralized and owned by a large corporation. Cut the problem at the root.

visarga 2 days ago | parent | prev [-]

> algorithms just take other people's work, mix it so that any individual training input is unrecognizable and resell it back to them

LLMs are huge and need special hardware to run. Cloud providers underprice even local hosting. Many providers offer free access.

But why are you not talking about what the LLM user brings? They bring a unique task or problem to solve. They guide the model and channel it towards the goal. In the end they take the risk of using anything from the LLM. Context is what they bring, and consequence sink.

martin-t 2 days ago | parent | next [-]

Quantity matters.

Imagine it took 10^12 hours to produce the training data, 10^6 hours to produce the training algorithm and 10^0 hours to write a bunch of prompts to get the model to generate a useful output.

How should the reward be distributed among the people who performed the work?

lawlessone 2 days ago | parent | prev [-]

>But why are you not talking about what the LLM user brings? They bring a unique task or problem to solve. They guide the model and channel it towards the goal. In the end they take the risk of using anything from the LLM.

I must remember next i'm shopping to demand the staff thank me when i ask them them where the eggs are.

martin-t 2 days ago | parent [-]

I was gonna make an analogy of stealing someone's screwdriver set when I need to solve a unique problem but this is so much better.

lawlessone 2 days ago | parent [-]

that's good too.