OpenAI is not profitable, and to achieve what they have achieved they had to scrape basically the entire internet. I don't have a hard time believing that OpenAI could not exist if they had to respect copyright.

https://www.cnbc.com/2024/09/27/openai-sees-5-billion-loss-t...

▲

noitpmeder 7 months ago | parent | next [-]

That's a good thing! If a company cannot raise to fame unless they violate laws, it should not have been there.

There is plenty of public domain text that could have taught a LLM English.

▲

suby 7 months ago | parent | next [-]

I'm not convinced that the economic harm to content creators is greater than the productivity gains and accessibility of knowledge for users (relative to how competent it would be if trained just on public domain text). Personally, I derive immense value from ChatGPT / Claude. It's borderline life changing for me.

As time goes on, I imagine that it'll increasingly be the case that these LLM's will displace people out of their jobs / careers. I don't know whether the harm done will be greater than the benefit to society. I'm sure the answer will depend on who it is that you ask.

> That's a good thing! If a company cannot raise to fame unless they violate laws, it should not have been there.

Obviously given what I wrote above, I'd consider it a bad thing if LLM tech severely regressed due to copyright law. Laws are not inherently good or bad. I think you can make a good argument that this tech will be a net negative for society, but I don't think it's valid to do so just on the basis that it is breaking the law as it is today.

▲

DrillShopper 7 months ago | parent [-]

> I'm not convinced that the economic harm to content creators is greater than the productivity gains and accessibility of knowledge for users (relative to how competent it would be if trained just on public domain text).

Good thing whether or not something is a copyright violation doesn't depend on if you can make more money with someone else's work than they can.

	▲	suby 7 months ago \| parent [-]
		I understand the anger about large tech companies using others work without compensation, especially when both they and their users benefit financially. But this goes beyond economcis. LLM tech could accelerate advances in medicine and technology. I strongly believe that we're going to see societal benefits in education, healthcare, especially mental health support thanks to this tech. I also think that someone making money off LLM's is a separate question from whether or not the original creator has been harmed. I think many creators are going to benefit from better tools, and we'll likely see new forms of creation become viable. We already recognize that certain uses of intellectual property should be permitted for societies benefit. We have fair use doctrine, patent compulsory licensing for public health, research exmpetions, and public libraries. Transformative use is also permitted, and LLMs are inherently transformative. Look at the volume of data that they ingest compared to the final size of a trained model, and how fundamentally different the output format is from the input data. Human progress has always built upon existing knowledge. Consider how both Darwin and Wallace independently developed evolution theory at roughly the same time -- not from isolation, but from building on the intellectual foundation of their era. Everything in human culture builds on what came before. That all being said, I'm also sure that this tech is going to negative impact people too. Like I said in the other reply, whether or not this tech is good or bad will depend on who you ask. I just think that we should weigh these costs against the potential benefits to society as a whole rather than simply preserving existing systems, or blindly following the law as if the law is inherently just or good. Copyright law was made before this tech was even imagined, and it seems fair to now evaluate whether the current copyright regime makes sense if it turns out that it'd keep us in some local maximum.

▲

YetAnotherNick 7 months ago | parent | prev [-]

> unless they violate laws

*unless they violate country laws.

Which means openAI or its alternative could survive in China but not in US. The question is that if we are fine with it?

▲

jpalawaga 7 months ago | parent | prev [-]

technically open ai has respected copyright, except in the (few) instances they produce non-fair-use amounts of copyrighted material.

dmca does not cover scraping.