Remix.run Logo
zozbot234 4 hours ago

If you ask a LLM to derive a spec that has no expressive element of the original code (a clean-room human team can carefully verify this), and then ask another instance of the LLM (with fresh context) to write out code from the spec, how is that different from a "clean room" rewrite? The agent that writes the new code only ever sees the spec, and by assumption (the assumption that's made in all clean room rewrites) the spec is purely factual with all copyrightable expression having been distilled out.

gf000 4 hours ago | parent | next [-]

I guess it depends on if the source data set is part of the training data or not (if it's open source it is likely part of it).

A lawyer could easily argue that the model itself stores a representation of the original, and thus it can never do a "fresh context".

And to be perfectly honest, LLMs can quote a lot of text verbatim.

k__ an hour ago | parent | prev | next [-]

How do you prove the training data didn't contain the code?

I'd assume an LLM trained on the original would also be contaminated.

miroljub 4 hours ago | parent | prev [-]

The new agent who writes code has probably at least parts of the original code as training data.

We can't speak about clean room implementation from LLM since they are technically capable only of spitting their training data in different ways, not of any original creation.

dizhn 2 hours ago | parent | next [-]

The conclusion of this would be that you can never license AI generated code since you can't get a release from the original authors.

Of course in practice it would work exactly in the opposite fashion and AI generated code would be immune even if it copied code verbatim.

jesterswilde 2 hours ago | parent [-]

I don't see what's wrong with that personally. If I pirated someone's software, and then sold it as my own and got caught, just because I sold a bunch of it doesn't mean those people who bought it now are in the clear. They are still using bootleg software in their business.

nubg 4 hours ago | parent | prev [-]

Only in the case of open source code