Remix.run Logo
krisoft 7 months ago

> I guess I should have used the phrase "common sense stealing in any other context" to be more precise?

Clearly not common sense stealing. The Intercept was not deprived of their content. If OpenAI would have sneaked into their office and server farm and took all the hard drives and paper copies with the content that would be "common sense stealing".

TheOtherHobbes 7 months ago | parent [-]

Very much common sense copyright violation though.

Copyright means you're not allowed to copy something without permission.

It's that simple. There is no "Yes but you still have your book" argument, because copyright is a claim on commercial value, not a claim on instantiation.

There's some minimal wiggle room for fair use, but clearly making an electronic copy and creating a condensed electronic version of the content - no matter how abstracted - and using it for profit is not fair use.

chii 7 months ago | parent [-]

> Copyright means you're not allowed to copy something without permission.

but is training an AI copying? And if so, why isn't someone learning from said work not considered copying in their brain?

throw646577 7 months ago | parent | next [-]

> but is training an AI copying?

If the AI produces chunks of training set nearly verbatim when prompted, it looks like copying.

> And if so, why isn't someone learning from said work not considered copying in their brain?

Well, their brain, while learning, is not someone's published work product, for one thing. This should be obvious.

But their brain can violate copyright by producing work as the output of that learning, and be guilty of plagiarism, etc. If I memorise a passage of your copyrighted book when I am a child, and then write it in my book when I am an adult, I've infringed.

The fact that most jurisdictions don't consider the work of an AI to be copyrightable does not mean it cannot ever be infringing.

CuriouslyC 7 months ago | parent | next [-]

The output of a model can be copyright violation. In fact, even if the model was never trained on copyright content, if I provided copyright text then told the model to regurgitate it verbatim that would be a violation.

That does not make the model copyright violation itself.

throw646577 7 months ago | parent [-]

This is is sort of like the argument against a blank tape levy or a tape copier tax, which is a reasonable argument in the context of the hardware.

But an LLM doesn't just enable direct duplication, it (well its model) contains it.

If software had a meaningful distribution cost or per-unit sale cost, a blank tape tax would be very appropriate for LLM sales.

But instead OpenAI is operating a for-pay duplication service where authors don't get a share of the proceeds -- it is doing the very thing that copyright laws were designed to dissuade by giving authors a time-limited right to control the profits from reproducing copies of their work.

trinsic2 7 months ago | parent | prev [-]

Yea good point. whats the difference between spidering content and training a model? Its almost like access pages of contact like a search engine.. If the information is publically available?

pera 7 months ago | parent | prev | next [-]

A product from a company is not a person. An LLM is not a brain.

If you transcode a CD to mp3 and build a business around selling these files without the author's permission you'd be in big legal problems.

Tech products that "accidentally" reproduce materials without the owners' permission (e.g. someone uploading La La Land into YouTube) have processes to remove them by simply filling a form. Can you do that with ChatGPT?

lelanthran 7 months ago | parent | prev | next [-]

Because the law considers scale.

It's legal for you to possess a single joint. It's not legal for you to possess a warehouse of 400 tons of weed.

The line between legal and not legal is sometimes based on scale; being able to ingest a single book and learn from it is not the same scale as ingesting the entire published works of mankind and learning from it.

krisoft 7 months ago | parent [-]

Are you describing what the law is or what you feel the law should be? Because those things are not always the same.

lelanthran 7 months ago | parent [-]

> Are you describing what the law is or what you feel the law should be?

I am stating what is, right now.

I thought the weed example made that clear.

Let me clarify: the state of things, as they stand, is that the entire justice system, legislation and courts included, takes scale into account when looking at the line dividing "legal" from "illegal".

There is literally no defense of "If it is legal at qty x1, it is legal at any qty".

krisoft 7 months ago | parent [-]

> I am stating what is, right now.

Excelent. Then the next question is where (in which jurisdiction) are you describing the law? And what are your sources? Not about the weed, i don’t care about that. Particularly the “being able to ingest a single book and learn from it is not the same scale as ingesting the entire published works of mankind and learning from it”.

The reason why i’m asking is because you are drawing a paralel between criminal law and (i guess?) copyright infringement. The drug posession limits in many jurisdictions are explicitly written into the law. These are not some grand principle of laws but the result of explicit legislative intent. The people writing the law wanted to punish drug peddlers without punishing end users. (Or they wanted to punish them less severly or differently.) Are the copyright limits you are thinking about similarly written down? Do you have case references one can read?

lelanthran 7 months ago | parent [-]

I made it clear in both my responses that scale matters, and that there is precedence in law, in almost all countries I can think off right now, for scale mattering.

I did not make the point that there is a written law specifically for copyright violations at scale (although many jurisdictions do have exemptions at small scale written into law).

I will try to clarify once again: there is no defence in law that because something is allowed at qty X1, it must be allowed at any qty.

This is the defence that was originally posted that I replied to, it is the one that is not valid because courts regularly consider the scale of an activity when determining the line between allowed and not allowed.

nkrisc 7 months ago | parent | prev | next [-]

Because AI isn’t a person.

hiatus 7 months ago | parent | prev [-]

Is training an AI the same as a person learning something? You haven't shown that to be the case.

chii 7 months ago | parent [-]

no i havent, but judging by the name - machine learning - i think it is the case.

yyuugg 7 months ago | parent [-]

Do you think starfish and jellyfish are fish? Judging by the name they are...