Interesting.

> the complaint is not about code generation, it's about ingesting someone else's code, frequently for profit.

Why do you think that is, and what complaint specifically? I was talking about this:

> The Copyright Office reviewed the decision in 2022 and determined that the image doesn't include “human authorship,” disqualifying it from copyright protection

There seems to be 0 mentioning of training there. In fact if you read the appeal's court case [1] they don't mention training either:

> We affirm the denial of Dr. Thaler’s copyright application. The Creativity Machine cannot be the recognized author of a copyrighted work because the Copyright Act of 1976 requires all eligible work to be authored in the first instance by a human being. Given that holding, we need not address the Copyright Office’s argument that the Constitution itself requires human authorship of all copyrighted material. Nor do we reach Dr. Thaler’s argument that he is the work’s author by virtue of making and using the Creativity Machine because that argument was waived before the agency.

I have no idea where you got the idea that this was about training data. Neither the copyright office nor the appeals court even mention this.

But anyway, since we're here, let's entertain this. So you're saying that training data is the differentiator. OK. So in that case, would training on "your own data" make this ok with you? Would training on "synthetic" data be ok? Would a model that sees no "proprietary" code be ok? Would a hypothetical model trained just on RL with nothing but a compiler and endless compute be ok?

The courts seem to hint that "human authorship" is still required. I see no end to the "... but what about x", as I stated in my first comment. I was honestly asking those questions, because the crux of the case here rests on "human authorship of the piece to be copyrighted", not on anything prior.

[1] - https://fingfx.thomsonreuters.com/gfx/legaldocs/egpblokwqpq/...

▲

lelanthran 2 hours ago | parent [-]

> There seems to be 0 mentioning of training there. In fact if you read the appeal's court case [1] they don't mention training either:

> ...

> I have no idea where you got the idea that this was about training data. Neither the copyright office nor the appeals court even mention this.

In both the story and the comments, that's the prevailing complaint. FTFA:

> Their claim that it is a “complete rewrite” is irrelevant, since they had ample exposure to the originally licensed code (i.e. this is not a “clean room” implementation). Adding a fancy code generator into the mix does not somehow grant them any additional rights.

I mean, I know it's passe to read the story, but I still do it so my comments are on the story, not just the title taken out of context.

> But anyway, since we're here, let's entertain this. So you're saying that training data is the differentiator.

Well, that's the complaint in the story and in the comment section, so it makes sense to address that and that alone.

> OK. So in that case, would training on "your own data" make this ok with you?

Yes.

> Would training on "synthetic" data be ok?

If provenance of "synthetic data" does not depend on some upstream ingesting someone else's work, then yes.

> Would a model that sees no "proprietary" code be ok?

If the model does not depend on someone else's work, then Yes.

> Would a hypothetical model trained just on RL with nothing but a compiler and endless compute be ok?

Yes.

*Note: Let me clarify that "someone else's work" means someone who has not consented or licended their work for ingestion and subsequent reproduction under the terms that AI/LLM training does it. If someone licensed you their work to train a model, then have at it.

	▲	NitpickLawyer 2 hours ago \| parent [-]
		Ah! I think I get where the confusion was. I was quoting something from another comment, and specifically commenting on that. > > To me it sounds like the AI-written work can not be coppywritten I was only commenting on that.