Remix.run Logo
kgeist 3 hours ago

Binaries are copyrightable in both the US and the EU, and they are not technically produced by a human either, they're produced by a computer program. I honestly don't understand why this isn't extended to AI-generated code. Isn't it the same thing? One could argue that compilers merely transform source code into binaries "as is," while AI models have some "knowledge" baked in that they extract and paste as code. But there are compilers that also generate binaries by selecting ready-to-use binary patches authored by compiler developers and combining them into a program. One could also argue that, in the case of compilers, at least the input source code is authored by a human. But why can't we treat prompts as "source code in natural language" too? Where is the line between authorship and non-authorship, and how is the line defined? "Your prompt was too basic to constitute authorship" doesn't sound like an objectibe criterion.

Maybe for lawyers, AI is some kind of magical thing on its own. But having successfully created a working inference engine for Qwen3, and seeing how the core loop is just ~50 lines of very simple matrix multiplication code, I can't see LLMs as anything more than pretty simple interpreters that process "neural network bytecode," which can output code from pre-existing templates just like some compilers. And I'm not sure how this is different from transpilers or autogenerated code (like server generators based on an OpenAPI schema)

Sure, if an LLM was trained on GPL code, it's possible it may output GPL-licensed code verbatim, but that's a different matter from the question of whether AI-generated code is copyrightable in principle.

Interestingly, I found an opinion here [0] that binaries technically shouldn't be copyrightable, and currently they are because:

  the copyright office listened to software publishers, and they wanted binaries protected by copyright so they could sell them that way
[0] https://freesoftwaremagazine.com/articles/what_if_copyright_...
wahern 3 hours ago | parent [-]

That linked opinion overstates the case. In the real-world, two different programs performing any non-trivial but functionally identical task will look substantially dissimilar in their source code, and that dissimilarity will carry over to the compiled binary, meaning what was expressive (if anything) is largely preserved. To the extent two different programs do end up with identical code, then that aspect was likely primarily functional and non-copyrightable, or at least the expressive character didn't carry over to the binary. Ordering and naming of APIs in source code can be expressive, and that indeed is often lost (literally or at least the expressive character) during the compilation process, but there are other expressive aspects to software programing that will be preserved and protected in the binary form.

IMO, your intuition regarding AI is right--it's not a magic copyright laundering machine, and AFAIU courts have very quickly agreed that infringement is occurring. But in copyright law establishing infringement (or the possibility of infringement) is the easy, straight-forward part. Copyright infringement liability is a much more complex question. Transformative uses in particular are a Fair Use, and Fair Use is technically treated as an affirmative defense to infringement.[1] If something is Fair Use, infringement is effectively presumed. But Fair Uses are typically very fact-intensive questions, and unlike the case with search engines I'm not sure we'll get to the point where there's a well-defined fence protecting "AI".

[1] There's a scholarly pedantic debate about whether Fair Use is properly a "defense", rather than "exception" to infringement, but it walks and talks like a defense in the sense that the defendant has the burden of proving Fair Use after the plaintiff has established infringement. There's a similarly pedantic (though slightly more substantive) debate in criminal law regarding affirmative defenses. But the very term "affirmative defense" was coined to recognize and avoid these pedantic debates.