> Ethical concerns. The business side of AI boom is creating serious ethical concerns. Among them: Commercial AI projects are frequently indulging in blatant copyright violations to train their models. Their operations are causing concerns about the huge use of energy and water. The advertising and use of AI models has caused a significant harm to employees and reduction of service quality. LLMs have been empowering all kinds of spam and scam efforts.

Highly disingenuous. First, AI being trained on copyrighted data is considered fair use because it transforms the underlying data rather than distribute it as is. Though I have to agree that this is the relatively strongest ethical claim to stop using AI but stands weak if looked at on the whole.

The fact that they mentioned "energy and water use" should tell you that they are really looking for reasons to disparage AI. AI doesn't use any more water or energy than any other tool. An hour of Netflix uses same energy as more than 100 GPT questions. A single 10 hour flight (per person*) emits as much as around 100k GPT prompts. It is strange that one would repeat the same nonsense about AI without primary motive being ideological.

"The advertising and use of AI models has caused a significant harm to employees and reduction of service quality." this is just a shoddy opinion at this point.

To be clear - I understand why they might ban AI for code submissions. It reduces the barrier significantly and increases the noise. But the reasoning is motivated from a wrong place.

▲

themafia 2 days ago | parent | next [-]

> AI being trained on copyrighted data is considered fair use because it transforms the underlying data rather than distribute it as is.

It's not a binary. Sometimes it fully reproduces works in violation of copyright and other times it modifies it just enough to avoid claims against it's output. Using AI and just _assuming_ it would never lead you to a copyright violation is foolish.

> uses same energy as more than 100 GPT questions.

Are you including training costs or just query costs?

> But the reasoning is motivated from a wrong place.

That does not matter. What matters is if the outcome is improved in the way they predict. This is actually measurable.

▲

simianwords 2 days ago | parent [-]

>That does not matter. What matters is if the outcome is improved in the way they predict. This is actually measurable.

Ok lets discuss facts.

>It's not a binary. Sometimes it fully reproduces works in violation of copyright and other times it modifies it just enough to avoid claims against it's output. Using AI and just _assuming_ it would never lead you to a copyright violation is foolish.

In the Anthropic case the Judge ruled that AI training is transformative. It is not binary as you said but I'm criticising what appears as binary from the original policy. When the court ruling itself has shown that it is not violation of copyright, it is reasonable to criticise it now although I acknowledge the post was written before the ruling.

>Are you including training costs or just query costs?

The training costs are very very small because they are amortised over all the queries. I think training accounts around .001% to .1% of each query depending on how many training runs are done over a year.

▲

twelvechairs 2 days ago | parent [-]

On copyright its worth noting that Gentoo has a substantial user base outside the USA (maybe primarily - see [0]) for whom the anthropic judgment you mention probably doesn't mean much

[0] https://trends.builtwith.com/Server/Gentoo-Linux

	▲	simianwords 2 days ago \| parent [-]
		Fair point but I would think EU would be all up on this. This is right up their alley and clearly an easy way to justify more regulations and slow down AI. Why hasn’t anything come out of it?

▲

ses1984 2 days ago | parent | prev | next [-]

The idea that models are transformative is debatable. Works with copyright are the thing that imbues the model with value. If that statement isn’t true, then they can just exclude those works and nothing is lost, right?

Also, half the problem isn’t distribution, it’s how those works were acquired. Even if you suppose models 44are transformative, you can’t just download stuff from piratebay. Buy copies, scan them, rip them, etc.

It’s super not cool that billion dollar vc companies can just do that.

	▲	simianwords 2 days ago \| parent \| next [-]
		> In Monday's order, Senior U.S. District Judge William Alsup supported Anthropic's argument, stating the company's use of books by the plaintiffs to train their AI model was acceptable. "The training use was a fair use," he wrote. "The use of the books at issue to train Claude and its precursors was exceedingly transformative." I agree it is debatable but it is not so cut and clear that it is _not_ transformative when a judge has ruled that it is.
	▲	perching_aix 2 days ago \| parent \| prev [-]
		> The idea that models are transformative is debatable. Works with copyright are the thing that imbues the model with value. If that statement isn’t true, then they can just exclude those works and nothing is lost, right? I don't follow. For one, all works have a copyright status I believe (under US jurisdiction; this of course differs per jurisdiction, although there are international IP laws), some are just extremely permissive. Models rely on a wide range of works, some with permissive, some with restrictive licensing. I'd imagine Wikipedia and StackOverflow are pretty important resources for these models for example, and both are licensed under CC BY-SA 4.0, a permissive license. Second, despite your claim being thus false, dropping restrictively copyrighted works would make a dent of course I'm pretty sure, although how much, I'm not sure. I don't see why this would be a surprise: restrictively licensed works do contribute value, but not all of the value. So their removal would take away some of the value, but not all of it. It's not binary. And finally, I'm not sure these aspects solely or even primarily determine whether these models are legally transformative. But then I'm also not a lawyer, and the law is a moving target, so what do I know. I'd imagine it's less legal transformativeness and more colloquial transformativeness you're concerned about anyhow, but then these are not necessarily the best aspects to interrogate either.

▲

notpachet 2 days ago | parent | prev | next [-]

> AI doesn't use any more water or energy than any other tool. An hour of Netflix uses same energy as more than 100 GPT questions. A single 10 hour flight (per person*) emits as much as around 100k GPT prompts. It is strange that one would repeat the same nonsense about AI without primary motive being ideological.

We should stop doing those things too. I'm still surprised that so many people are flying.

	▲	simianwords a day ago \| parent [-]
		I agree but the magnitudes are important. I don’t want to give up a few prompts per day because of climate. That is stupid.

▲

2 days ago | parent | prev | next [-]

[deleted]

▲

shmerl 2 days ago | parent | prev | next [-]

I don't get this idea. Transformative works don't automatically equal fair use - copyright covers all kind of transformative works.

▲

CursedSilicon 2 days ago | parent | prev | next [-]

That's quite a strawman definition of "copyright infringement" especially given the ongoing Anthropic lawsuit

It's not a question of if feeding all the worlds books into a blender and eating the resulting slurry paste is copyright infringement. It's that they stole the books in the first place by getting them from piracy websites

If they'd purchased every book ever written, scanned them in and fed that into the model? That would be perfectly legal

▲

steveklabnik 2 days ago | parent [-]

That’s what happened; the initial piracy was an issue, but those models were never released, and the models that were released were trained on copyrighted works they purchased.

▲

boristsr 2 days ago | parent [-]

That's not true, or they wouldn't have settled for 1.5bln specifically for training on pirated material.

https://apnews.com/article/anthropic-copyright-authors-settl...

	▲	steveklabnik a day ago \| parent [-]
		As I said, the initial piracy was an issue. That is what they settled over. Your link covers this: > A federal judge dealt the case a mixed ruling in June, finding that training AI chatbots on copyrighted books wasn’t illegal but that Anthropic wrongfully acquired millions of books through pirate websites. With more details about how they later did it legally, and that was fine, but it did not excuse the earlier piracy: > But documents disclosed in court showed Anthropic employees’ internal concerns about the legality of their use of pirate sites. The company later shifted its approach and hired Tom Turvey, the former Google executive in charge of Google Books, a searchable library of digitized books that successfully weathered years of copyright battles. > With his help, Anthropic began buying books in bulk, tearing off the bindings and scanning each page before feeding the digitized versions into its AI model, according to court documents. That was legal but didn’t undo the earlier piracy, according to the judge.

▲

infamia 2 days ago | parent | prev [-]

> Highly disingenuous. First, AI being trained on copyrighted data is considered fair use because it transforms the underlying data rather than distribute it as is.

Your legal argument aside, they downloaded torrents and trained their AI on them. You can't get much more blatant than that.

	▲	simianwords 2 days ago \| parent [-]
		Yes but that was one company and it is not core to their infra or product. So I don’t know how one can characterize AI fundamentally to be unethical because one company pirated some books