Remix.run Logo
throw646577 7 months ago

> but is training an AI copying?

If the AI produces chunks of training set nearly verbatim when prompted, it looks like copying.

> And if so, why isn't someone learning from said work not considered copying in their brain?

Well, their brain, while learning, is not someone's published work product, for one thing. This should be obvious.

But their brain can violate copyright by producing work as the output of that learning, and be guilty of plagiarism, etc. If I memorise a passage of your copyrighted book when I am a child, and then write it in my book when I am an adult, I've infringed.

The fact that most jurisdictions don't consider the work of an AI to be copyrightable does not mean it cannot ever be infringing.

CuriouslyC 7 months ago | parent | next [-]

The output of a model can be copyright violation. In fact, even if the model was never trained on copyright content, if I provided copyright text then told the model to regurgitate it verbatim that would be a violation.

That does not make the model copyright violation itself.

throw646577 7 months ago | parent [-]

This is is sort of like the argument against a blank tape levy or a tape copier tax, which is a reasonable argument in the context of the hardware.

But an LLM doesn't just enable direct duplication, it (well its model) contains it.

If software had a meaningful distribution cost or per-unit sale cost, a blank tape tax would be very appropriate for LLM sales.

But instead OpenAI is operating a for-pay duplication service where authors don't get a share of the proceeds -- it is doing the very thing that copyright laws were designed to dissuade by giving authors a time-limited right to control the profits from reproducing copies of their work.

trinsic2 7 months ago | parent | prev [-]

Yea good point. whats the difference between spidering content and training a model? Its almost like access pages of contact like a search engine.. If the information is publically available?