> Blanchard is, of course, familiar with the source code, he's been its maintainer for years.

I would argue it's irrelevant if they looked or didn't look at the code. As well as weather he was or wasn't familiar with it.

What matters is, that they feed to original code into a tool which they setup to make a copy of it. How that tool works doesn't really matter. Neither does it make a difference if you obfuscate that it's an copy.

If I blindfold myself when making copies of books with a book scanner + printer I'm still engaging in copyright infringement.

If AI is a tool, that should hold.

If it isn't "just" a tool, then it did engage in copyright infringement (as it created the new output side by side with the original) in the same way an employee might do so on command of their boss. Which still makes the boss/company liable for copyright infringement and in general just because you weren't the one who created an infringing product doesn't mean you aren't more or less as liable of distributing it, as if you had done so.

▲

Legend2440 15 minutes ago | parent | next [-]

>that they feed to original code into a tool which they setup to make a copy of it

Well, no. They fed the spec (test cases, etc) into a tool which made a new program matching the spec. This is not a copy of the original code.

But also this feels like arguing over the color of the iceberg while the titanic sinks. If you have a tool that can make code to spec, what is the value in source code anymore? Even if your app is closed-source, you can just tell claude to write new code that does the same thing.

▲

spullara 3 hours ago | parent | prev | next [-]

if the actual text of the code isn't the same or obviously derivative, copyright doesn't apply at all.

▲

sigseg1v 2 hours ago | parent | next [-]

What does derivative mean here? Because IMO it means that the existing work was used as input. So if you used a LLM and it was trained on the existing work, that's a derivative work. If you rot13 encode something as input, so you can't personally read it, and then a device decides to rot13 on it again and output it, that's a derivative work.

▲

spullara 2 hours ago | parent | next [-]

In order for it to be creatively derivative you would need to copy the structure, logic, organization, and sequence of operations not just reimplement the functionality. It is pretty clear in this case that wasn't done.

▲

ghostpepper 2 hours ago | parent | prev | next [-]

As a cynical person I assume all the frontier LLMs were trained on datasets that include every open source project, but as a thought experiment, if an LLM was trained on a dataset that included every open source project _execept_ chardet, do you think said LLM would still be able to easily implement something very similar?

	▲	spullara 2 hours ago \| parent [-]
		There is no doubt in my mind that it could still do it.

▲

nicole_express 2 hours ago | parent | prev | next [-]

Of course, the problem with this interpretation is that all modern LLMs are derivatives from huge amounts of text under completely different licenses, including "All rights reserved", and therefore can not be used for any purpose.

I'm not sure how you square the circle of "it's alright to use the LLM to write code, unless the code is a rewrite of an open source project to change its license".

▲

satvikpendem 2 hours ago | parent | prev | next [-]

> Because IMO it means that the existing work was used as input

That's your opinion (since you said "IMO"), not the actual legal definition.

▲

bmcahren 2 hours ago | parent | prev | next [-]

LLMs do not encode nor encrypt their training data. The fact they can recite training data is a defect not a default. You can understand this more simply by calculating the model size as an inverse of a fantasy compression algorithm that is 50% better than SOTA. You'll find you'd still be missing 80-90% of the training data even if it were as much of a stochastic parrot as you may be implying. The outputs of AI are not derivative just because they saw training data including the original library.

Then onto prompting: 'He fed only the API and (his) test suite to Claude'

This is Google v Oracle all over again - are APIs copyrightable?

	▲	satvikpendem 2 hours ago \| parent [-]
		> This is Google v Oracle all over again - are APIs copyrightable? Yes this is the best way to ask the question. If I take a public facing API and reimplement everything, whether it's by human or machine, it should be sufficient. After all, that's what Google did, and it's not like their engineers never read a single line of the Java source code. Even in "clean room" implementations, a human might still have remembered or recalled a previous implementation of some function they had encountered before.

▲

wizzwizz4 2 hours ago | parent | prev [-]

See also: https://monolith.sourceforge.net/, which seeks to ask the question:

> But how far away from direct and explicit representations do we have to go before copyright no longer applies?

▲

yorwba an hour ago | parent | prev | next [-]

Copyright protects even very abstract aspects of human creative expression, not just the specific form in which it is originally expressed. If you translate a book into another language, or turn it into a silent movie, none of the actual text may survive, but the story itself remains covered by the original copyright.

So when you clone the behavior of a program like chardet without referencing the original source code except by executing it to make sure your clone produces exactly the same output, you may still be infringing its copyright if that output reflects creative choices made in the design of chardet that aren't fully determined by the functional purpose of the program.

▲

NSUserDefaults an hour ago | parent | prev [-]

If you pirate a movie and reencode it, does that apply as well? You can still watch the movie and it is “obviously” the same movie. Here you can use the program and it is, to the user, also the same.

▲

margalabargala 2 hours ago | parent | prev [-]

> If it isn't "just" a tool, then it did engage in copyright infringement

Just like how the photos taken by a monkey with a camera have no copyright. Human law binds humans.

▲

malicka 2 hours ago | parent [-]

Correct. The human who shares the copy is the one who engages in copyright infringement.

	▲	margalabargala an hour ago \| parent [-]
		So, let's say that rather than actually touching any copyrighted material, a human merely tells an AI about how to go onto the internet and find copyrighted material, download it, and ingest it for training. The AI, fully autonomously, does so, and after training itself on the material deletes it so no human ever downloads, consumes, or shares it. If we are saying AI is "more than a tool", which seems to be the case courts are leaning since they've ruled AI output without direct human involvement is not copyrightable[0], then the above seems like it would be entirely legal. [0] https://www.copyright.gov/newsnet/2025/1060.html