I thought the title meant the training data used was ethics content and ethical reasoning. Turns out "ethically trained" means the training data used doesn't violate copyright laws.

▲

CoastalCoder 3 days ago | parent | next [-]

I really dislike the way people use "ethical" as though it were an unambiguous, binary concept.

Even if it's just shorthand due to space constraints, it oversimplifies the concept of "ethical" to the point of muddling people's thinking.

▲

RobotToaster 3 days ago | parent | prev | next [-]

I thought it was trained trained using Victorian ethics at first... Like it was only trained on computers powered by coal mined by children.

	▲	phoronixrly 3 days ago \| parent [-]
		I wonder whether Jensen Huang would be OK if we rolled these safeguards back to help power his DCs...

▲

DonHopkins 3 days ago | parent | prev | next [-]

As if copyright laws were ethical.

▲

thih9 3 days ago | parent [-]

Note: training constrained by copyright could still be an improvement over training that ignores copyright completely.

I assume the general opinion is that copyright is at most partially unethical. That’s what the AI discussion is about too, i.e. artist copyright.

▲

nsvd2 3 days ago | parent [-]

Given the extent to which the copyright system has benefited corporations and publishing companies to the detriment of individual authors and the general public, I'm constantly surprised that it still has many apologists.

	▲	tmtvl 3 days ago \| parent \| next [-]
		As we don't live in a world where the rich patronize the arts some sort of copyright system is the only way authors and artists are gonna make a living doing their thing. ...though I suppose proponents of Universal Basic Income (UBI) would disagree, but between the abolishment of copyright, the institution of UBI, or a 7 year old child being hit by 7 lightning strikes and 7 meteor impacts and surviving; the latter seems the most likely.
	▲	thih9 3 days ago \| parent \| prev \| next [-]
		What do you suggest instead? I.e. what would benefit individual authors more?
	▲	PunchyHamster 3 days ago \| parent \| prev [-]
		People imagine poor author having their thing stolen rather than poor author that corporate takes IP from by contract agreement (and if you don't do that, you don't get the job), then abuses for 70+ years

▲

fouc 3 days ago | parent | prev | next [-]

It would be interesting to talk with a victorian-era chatbot, including victorian-era ethics. would be interesting to see how much divergence from modern era ethics it would have.

▲

verdverm 4 days ago | parent | prev | next [-]

Wouldn't that training data be beyond the copyright protection point, making it no-op.

▲

ImHereToVote 3 days ago | parent | prev | next [-]

I believe the works are no longer under copyright. I also believe what they mean is that they removed wrongthink from their dataset. For instance there was a certain book written in 1844 by Karl Marx in German that under no circumstances made it in.

This ofc means that the LLM is completely pointless.

https://www.marxists.org/archive/marx/works/date/index.htm

	▲	3 days ago \| parent [-]
		[deleted]

▲

scoot 3 days ago | parent | prev [-]

If training data of any kind violated copyright, every creator alive would be in breach of by virtue of any influence their “training data” (lifelong exposure to the work of others) has on their output.

The creators crying foul of AI are painting themselves into a corner, both literally and figuratively.

	▲	miyoji 3 days ago \| parent [-]
		This is a truly awful argument that keeps coming up. It relies on the false equivalence between training an AI (a technical process that involves copying a work into computer storage), and a human being experiencing a work, which doesn't involve any kind of copying (and usually involves the human legally purchasing the work, which AI companies did not do). There is a legal difference as well as a technical difference. AIs don't learn the same way human brains do. The law does not treat these things the same. You may want to draw an analogy between the two and say they're "basically the same", but they are not basically the same. They aren't the same at all, outside of a very weak analogy. Is training kind of sort of like human learning? Yes. That doesn't mean anything. Dogs are kind of sort of like children, but if you try to treat your child the way you treat your dog, you end up in prison. Because children aren't dogs, either in reality, or in the eyes of the legal system. Please, AI boosters, stop using this one. Human brains aren't clocks. Human brains aren't computers. Human brains aren't LLMs. AI training does not mimic human learning in any significant way.