This is, in fact, the core value of the hacker ethos. HackerNews.

> The belief that information-sharing is a powerful positive good, and that it is an ethical duty of hackers to share their expertise by writing open-source code and facilitating access to information and to computing resources wherever possible.

> Most hackers subscribe to the hacker ethic in sense 1, and many act on it by writing and giving away open-source software. A few go further and assert that all information should be free and any proprietary control of it is bad; this is the philosophy behind the GNU project.

http://www.catb.org/jargon/html/H/hacker-ethic.html

Perhaps if the Internet didn't kill copyright, AI will. (Hyperbole)

(Personally my belief is more nuanced than this; I'm fine with very limited copyright, but my belief is closer to yours than the current system we have.)

▲

raincole a year ago | parent | next [-]

Open AI scrapping copyrighted materials to make a proprietary model is the exact opposite of what GNU promotes.

	▲	CaptainFever a year ago \| parent \| next [-]
		As I mentioned in another comment: "Scrapping" (scraping) copyrighted materials is not the wrong thing to do. Making it proprietary is. It is important to be clear about what is wrong, so you don't accidentally end up fighting for copyright expansion, or fighting against open models.
	▲	YetAnotherNick 10 months ago \| parent \| prev [-]
		If you open the link it will be more clear to you that GNU wants to share information with so called bad hackers.

▲

onetokeoverthe a year ago | parent | prev | next [-]

Creators freely sharing with attribution requested is different than creations being ruthlessly harvested and repurposed without permission.

https://creativecommons.org/share-your-work/

▲

a57721 a year ago | parent | next [-]

> freely sharing with attribution requested

If I share my texts/sounds/images for free, harvesting and regurgitating them omits the requested attribution. Even the most permissive CC license (excluding CC0 public domain) still requires an attribution.

▲

CaptainFever a year ago | parent | prev [-]

> A few go further and assert that all information should be free and any proprietary control of it is bad; this is the philosophy behind the GNU project.

In this view, the ideal world is one where copyright is abolished (but not moral rights). So piracy is good, and datasets are also good.

Asking creators to license their work freely is simply a compromise due to copyright unfortunately still existing. (Note that even if creators don't license their work freely, this view still permits you to pirate or mod it against their wishes.)

(My view is not this extreme, but my point is that this view was, and hopefully is, still common amongst hackers.)

I will ignore the moralizing words (eg "ruthless", "harvested" to mean "copied"). It's not productive to the conversation.

▲

onetokeoverthe a year ago | parent [-]

If not respected, some Creators will strike, lay flat, not post, go underground.

Ignoring moral rights of creators is the issue.

▲

CaptainFever a year ago | parent [-]

Moral rights involve the attribution of works where reasonable and practical. Clearly doing so during inference is not reasonable or practical (you'll have to attribute all of humanity!) but attributing individual sources is possible and is already being done in cases like ChatGPT Search.

So I don't think you actually mean moral rights, since it's not being ignored here.

But the first sentence of your comment still stands regardless of what you meant by moral rights. To that, well... we're still commenting here, are we not? Despite it with almost 100% certainty being used to train AI. We're still here.

And yes, funding is a thing, which I agree needs copyright for the most part unfortunately. But does training AI on, for example, a book really reduce the need to buy the book, if it is not reproduced?

Remember, training is not just about facts, but about learning how humans talk, how languages work, how books work, etc. Learning that won't reduce the book's economical value.

And yes, summaries may reduce the value. But summaries already exist. Wikipedia, Cliff's Notes. I think the main defense is that you can't copyright facts.

▲

onetokeoverthe a year ago | parent [-]

we're still commenting here, are we not? Despite it with almost 100% certainty being used to train AI. We're still here

?!?! Comparing and equating commenting to creative works. ?!?!

These comments are NOT equivalent to the 17 full time months it took me to write a nonfiction book.

Or an 8 year art project.

When I give away my work I decide to whom and how.

▲

CaptainFever a year ago | parent [-]

I have already covered these points in the latter paragraphs.

You might want to take a look at https://www.gnu.org/philosophy/shouldbefree.en.html

▲

onetokeoverthe a year ago | parent [-]

I'll decide the distribution of my work. Be it 100 million unique views or NOT at all.

▲

CaptainFever a year ago | parent [-]

If you don't have a proper argument, it's best not to distribute your comment at all.

▲

onetokeoverthe a year ago | parent [-]

If saying it's my work is not a "proper" argument, that says it all.

▲

CaptainFever a year ago | parent [-]

Indeed, owner.

Look, either actually read the link and refute the points within, or don't. But there's no use discussing anything if you're unwilling to even understand and seriously refute a single point being made here, other than repeating "mine, mine, mine".

▲

onetokeoverthe a year ago | parent [-]

Read it. Lots of nots, and no respect.

In the process, [OpenAI] trained ChatGPT not to acknowledge or respect copyright, not to notify ChatGPT users when the responses they received were protected by journalists’ copyrights, and not to provide attribution when using the works of human journalists

	▲	CaptainFever a year ago \| parent [-]
		No, wrong link. https://news.ycombinator.com/item?id=42279218

▲

AlienRobot a year ago | parent | prev | next [-]

I think an ethical hacker is someone who uses their expertise to help those without.

How could an ethical hacker side with OpenAI, when OpenAI is using its technological expertise to exploit creators without?

▲

CaptainFever a year ago | parent [-]

I won't necessarily argue against that moral view, but in this case it is two large corporations fighting. One has the power of tech, the other has the power of the state (copyright). So I don't think that applies in this case specifically.

▲

Xelynega a year ago | parent [-]

Aren't you ignoring that common law is built on precedent? If they win this case, that makes it a lot easier for people who's copyright is being infringed on an individual level to get justice.

▲

CaptainFever a year ago | parent [-]

You're correct, but I think many don't realize how many small model trainers and fine-tuners there are currently. For example, PonyXL, or the many models and fine-tunes on CivitAI made by hobbyists.

So basically the reasoning is this:

- NYT vs OpenAI, neither is disenfranchied - OpenAI vs individual creators, creators are disenfranchised - NYT vs individual model trainers, model trainers are disenfranchised - Individual model trainers vs individual creators, neither are disenfranchised

And if only one can win, and since the view is that information should be free, it biases the argument towards the model trainers.

▲

AlienRobot a year ago | parent [-]

What "information" are you talking about? It's a text and image generator.

Your argument is that it's okay to scrape content when you are an individual. It doesn't change the fact those individuals are people with technical expertise using it to exploit people without.

If they wrote a bot to annoy people but published how many people got angry about it, would you say it's okay because that is information?

You need to draw the line somewhere.

	▲	CaptainFever a year ago \| parent [-]
		Text and images are information, though. > If they wrote a bot to annoy people but published how many people got angry about it, would you say it's okay because that is information? Kind of? It's not okay, but not because it is usage of information without consent (this is the "information should free" part), but because it is intentionally and unnecessarily annoying and angering people (this is the "don't use the information for evil" part which I think is your position). "See? Similarly, even in your view, model trainers aren't bad because they're using data. They're bad in general because they're exploiting creatives." But why is it exploitative? "They're putting the creatives out of a job." But this applies to automation in general. "They're putting creatives out of a job, using data they created." This is the strongest argument for me. It does intuitively feel exploitative. However, there are several issues: 1. Not all models or datasets do that. For instance, no one is visibly getting paid to write comments on HN, or to write fanfics on the non-commercial fanfic site AO3. Since the data creators are not doing it as a job in the first place, it does not make sense to talk about them losing their job because of the very same data. 2. Not all models or datasets do that. For example, spam filters, AI classifiers. All of this can be trained from the entire Internet and not be exploitative because there is no job replacement involved here. 3. Some models already do that, and are already well and morally accepted. For example, Google Translate. 4. This may be resolved by going the other way and making more models open source (or even leaks), so more creatives can use it freely, so they can make use of the productive power. "Because they're using creatives' information without consent." But as mentioned, it's not about the information or consent. It's about what you do with the information. Finally, because this is a legal case, it's also important to talk about the morality of using the state to restrict people from using information freely, even if their use of the information is morally wrong. If you believe in free culture as in free speech, then it is wrong to restrict such a use using the law, even though we might agree it is morally wrong. But this really depends if you believe in free culture as in free speech in the first place, which is a debate much larger than this.

▲

Xelynega a year ago | parent | prev | next [-]

I don't understand what the "hacker ethos" could have to do with defending openai's blatant stealing of people's content for their own profit.

Openai is not sharing their data(they're keeping it private to profit off of), so how could it be anywhere near the "hacker ethos" to believe that everyone else needs to hand over their data to openai for free?

	▲	CaptainFever a year ago \| parent [-]
		Following the "GNU-flavour hacker ethos" as described, one concludes that it is right for OpenAI to copy data without restriction, it is wrong for NYT to restrict others from using their data, and it is also wrong for OpenAI to restrict the sharing of their model weights or outputs for training. Luckily, most people seem to ignore OpenAI's hypocritical TOS against sharing their output weights for training. I would go one step further and say that they should share the weights completely, but I understand there's practical issues with that. Luckily, we can kind of "exfiltrate" the weights by training on their output. Or wait for someone to leak it, like NovelAI did.

▲

ysofunny a year ago | parent | prev [-]

oh please, then, riddle me why does my comment has -1 votes on "hacker" news

which has indeed turned into "i-am-rich-cuz-i-own-tech-stock"news

	▲	alwa a year ago \| parent \| next [-]
		I did not contribute a vote either way to your comment above, but I would point out that you get more of what you reward. Maybe the reward is monetary, like an author paid for spending their life writing books. Maybe it’s smaller, more reputational or social—like people who generate thoughtful commentary here, or Wikipedia’s editors, or hobbyists’ forums. When you strip people’s names from their words, as the specific count here charges; and you strip out any reason or even way for people to reward good work when they appreciate it; and you put the disembodied words in the mouth of a monolithic, anthropomorphized statistical model tuned to mimic a conversation partner… what type of thought is it that becomes abundant in this world you propose, of “data abundance”? In that world, the only people who still have incentive to create are the ones whose content has negative value, who make things people otherwise wouldn’t want to see: advertisers, spammers, propagandists, trolls… where’s the upside of a world saturated with that?
	▲	CaptainFever a year ago \| parent \| prev [-]
		Yes, I have no idea either. I find it disappointing. I think people simply like it when data is liberated from corporations, but hate it when data is liberated from them. (Though this case is a corporation too so idk. Maybe just "AI bad"?)