new | show | ask | jobs Github

lelanthran 3 hours ago

All of your questions have seemingly trivial answers. Maybe I am missing something, but...

> If "generated" code is not copyrightable, where do draw the line on what generated means? Do macros count?

Does the output of the macro depend on ingesting someone else's code?

> Does code that generates other code count?

Does the output of the code depend on ingesting someone else's code?

> Protobuf?

Does your protobuf implementation depend on ingesting someone else's code?

> If it's the tool that generates the code, again where do we draw the line?

Does the tool depend ingestion of of someone else's code?

> Is it just using 3rd party tools?

Does the 3rd party tool depend on ingestion of someone else's code?

> Would training your own count?

Does the training ingest someone else's code?

> Would a "random" code gen and pick the winners (by whatever means) count?

Does the random codegen depend on ingesting someone else's code?

> Bruteforce all the space (silly example but hey we're in silly space here) counts?

Does the bruteforce algo depend on ingesting someone else's code?

> Is it just "AI" adjacent that isn't copyrightable?

No, it's the "depends on ingesting someone else's code" that makes it not copyrightable.

> If so how do you define AI?

Doesn't matter whether it is AI or not, the question is are you ingesting someone else's code.

> Does autocomplete count?

Does the specific autocomplete in question depend on ingesting someone else's code?

> Intellisense?

Does the specific Intellisense in question depend on ingesting someone else's code?

> Smarter intellisense?

Does the specific Smarter Intellisense in question depend on ingesting someone else's code?

...

Look, I see where you're going with this - reductio ad absurdum and all - but it seems to me that you're trying to muddy the waters by claiming that either all code generation is allowed or no code generation is disallowed.

Let me clear the waters for all the readers - the complaint is not about code generation, it's about ingesting someone else's code, frequently for profit.

All these questions you are asking seem to me to be irrelevant and designed to shift the focus from the ingestion of other people's work to something that no one is arguing against.

▲

NitpickLawyer 2 hours ago | parent | next [-]

Interesting.

> the complaint is not about code generation, it's about ingesting someone else's code, frequently for profit.

Why do you think that is, and what complaint specifically? I was talking about this:

> The Copyright Office reviewed the decision in 2022 and determined that the image doesn't include “human authorship,” disqualifying it from copyright protection

There seems to be 0 mentioning of training there. In fact if you read the appeal's court case [1] they don't mention training either:

> We affirm the denial of Dr. Thaler’s copyright application. The Creativity Machine cannot be the recognized author of a copyrighted work because the Copyright Act of 1976 requires all eligible work to be authored in the first instance by a human being. Given that holding, we need not address the Copyright Office’s argument that the Constitution itself requires human authorship of all copyrighted material. Nor do we reach Dr. Thaler’s argument that he is the work’s author by virtue of making and using the Creativity Machine because that argument was waived before the agency.

I have no idea where you got the idea that this was about training data. Neither the copyright office nor the appeals court even mention this.

But anyway, since we're here, let's entertain this. So you're saying that training data is the differentiator. OK. So in that case, would training on "your own data" make this ok with you? Would training on "synthetic" data be ok? Would a model that sees no "proprietary" code be ok? Would a hypothetical model trained just on RL with nothing but a compiler and endless compute be ok?

The courts seem to hint that "human authorship" is still required. I see no end to the "... but what about x", as I stated in my first comment. I was honestly asking those questions, because the crux of the case here rests on "human authorship of the piece to be copyrighted", not on anything prior.

[1] - https://fingfx.thomsonreuters.com/gfx/legaldocs/egpblokwqpq/...

▲

lelanthran 2 hours ago | parent [-]

> There seems to be 0 mentioning of training there. In fact if you read the appeal's court case [1] they don't mention training either:

> ...

> I have no idea where you got the idea that this was about training data. Neither the copyright office nor the appeals court even mention this.

In both the story and the comments, that's the prevailing complaint. FTFA:

> Their claim that it is a “complete rewrite” is irrelevant, since they had ample exposure to the originally licensed code (i.e. this is not a “clean room” implementation). Adding a fancy code generator into the mix does not somehow grant them any additional rights.

I mean, I know it's passe to read the story, but I still do it so my comments are on the story, not just the title taken out of context.

> But anyway, since we're here, let's entertain this. So you're saying that training data is the differentiator.

Well, that's the complaint in the story and in the comment section, so it makes sense to address that and that alone.

> OK. So in that case, would training on "your own data" make this ok with you?

Yes.

> Would training on "synthetic" data be ok?

If provenance of "synthetic data" does not depend on some upstream ingesting someone else's work, then yes.

> Would a model that sees no "proprietary" code be ok?

If the model does not depend on someone else's work, then Yes.

> Would a hypothetical model trained just on RL with nothing but a compiler and endless compute be ok?

Yes.

*Note: Let me clarify that "someone else's work" means someone who has not consented or licended their work for ingestion and subsequent reproduction under the terms that AI/LLM training does it. If someone licensed you their work to train a model, then have at it.

	▲	NitpickLawyer 2 hours ago \| parent [-]
		Ah! I think I get where the confusion was. I was quoting something from another comment, and specifically commenting on that. > > To me it sounds like the AI-written work can not be coppywritten I was only commenting on that.

▲

user34283 2 hours ago | parent | prev [-]

I'm thinking that the relevant question would be whether the part where we want to know if is copyrightable is an intellectual invention of a human mind.

"Ingesting someone else's code" does not seem very useful here - it's hardly quantifiable, nor is "ingestion" the key question I believe.