> Lacking Copyright (or similarily a Public Domain declaration by a human), we don't receive sufficient rights grants which would permit us to include it into the aggregate body of source code, without that aggregate body becoming less free than it is now.

Can someone explain this to me? I was under the impression that if a work of authorship was not copyrightable because it was AI generated and not authored by a human, it was in the public domain and therefore you could do whatever you wanted with it. Normal copyright restrictions would not apply here.

▲

Joel_Mckay 3 hours ago | parent [-]

Data theft of service or piracy from the web and "AI" users content are used in the model training sets, and when codified the statistical saliency is significant if popular content is present.

For example, when an LLM does a vector search, there is a high probability of pirated content bleed-though and isomorphic plagiarism in the high dimensional vector space results. Thus, often when you coincidentally type in "name a cartoon mouse", there is a higher probability Disney "Micky Mouse" will pop out in the output rather than "Mighty Mouse". Note Trademarks never expire if the fees are paid, and Disney can still technically sue anyone that messes with their mouse.

Much like em dashes "--", telling the current set of models to stop using them inappropriately often fails. Also, activation capping is used to improve the models behavioral vector, and have nothing to do with the Anthropic CEO developing political ethics.

LLM are useful for context search, but can't function properly without constantly stealing from actual humans. Thus, will often violate copyright, trademark, and patents. In a commercial context it is legally irrelevant how the output has misappropriated IP, and one can bet your wallet the lawyers won't care either. No, IP is not public domain for a long time (17 to 78 years) regardless of peoples delusions, even if some kid in a place like India (no software patents) thinks it is..

This channel offers several simplified explanations of the work being done with models, and Anthropic posts detailed research papers on its website.

https://www.youtube.com/watch?v=YDdKiQNw80c

https://www.youtube.com/watch?v=Xx4Tpsk_fnM

https://www.youtube.com/watch?v=JAcwtV_bFp4

Many YC bots are poisoning discourse -- so this thread will likely get negative karma. Some LLM users seem to develop emotional or delusional relationships with the algorithms. The internet is already >52% generated nonsense and growing. =3

	▲	ethin 14 minutes ago \| parent [-]
		This does not answer my question. The quoted content said that "Lacking Copyright (or similarily a Public Domain declaration by a human), we don't receive sufficient rights grants which would permit us to include it into the aggregate body of source code, without that aggregate body becoming less free than it is now." I was explicitly asking how this meshed with my understanding of copyright, at least in the United States, which requires that a work of authorship be authored by a human and not by a machine; where a work is not authored by a human, copyright protection does not subsist, and therefore the respective work is in the public domain. And I was further asking for an explanation as to how including a work that is AI-generated (aka in the public domain) made "... that aggregate body becoming less free". Unless my understanding of copyright law and court precedent is massively off the mark, I am confused as to how less freedom is aforded in this instance.