The argument that a rewrite is a copyright violation because they are familiar with the code base is not fully sound.

"Insider Knowledge" is not relevant for copyright law. That is more in the space of patent law then copyright law.

Or else a artist having seen a picture of a sunset over an empty ocean wouldn't be allowed to pain another sunset over an empty ocean as people could claim copyright violation.

Through what is a violation is, if you place the code side by side and try to circumvent copyright law by just rephrasing the exact same code.

This also means that if you give an AI access to a code base and tell it to produce a new code base doing the same (or similar) it will most likely be ruled as copyright violation as it's pretty much a side by side rewriting.

But you very much can rewrite a project under new license even if you have in depth knowledge. IFF you don't have the old project open/look at it while doing so. Rewrite it from scratch. And don't just rewrite the same code from memory, but instead write fully new code producing the same/similar outputs.

Through while doing so is not per-se illegal, it is legally very attackable. As you will have a hard time defending such a rewrite from copyright claims (except if it's internally so completely different that it stops any claims of "being a copy", e.g. you use complete different algorithms, architecture, etc. to produce the same results in a different way).

In the end while technically "legally hard to defend" != "illegal", for companies it's most times best to treat it the same.

▲

simiones 4 hours ago | parent | next [-]

> "Insider Knowledge" is not relevant for copyright law. That is more in the space of patent law then copyright law.

On the contrary. Except for discussions about punitive damages and so on, insider knowledge or lack thereof is completely irrelevant to patent law. If company A has a patent on something, they can assert said patent against company B regardless of whether any person in company B had ever seen or heard of company A and their patent. Company B could have a legal trail proving they invented their product that matches the patent from scratch with no outside knowledge, and that they had been doing this before company A had even filed their patent, and it wouldn't matter at all - company A, by virtue of filing and being granted a patent, has a legal monopoly on that invention.

In contrast, for copyright the right is intrinsically tied to the origin of a work. If you create a digital image that is entirely identical at the pixel level with a copyrighted work, and you can prove that you had never seen that original copyrighted work and you created your image completely independently, then you have not broken anyone's copyright and are free to sell copies of your own work. Even more, you have your own copyright over your own work, and can assert it over anyone that tries to copy your work without permission, despite an identical work existing and being owned by someone else.

Now, purely in principle this would remain true even if you had seen the other work. But in reality, it's impossible to convince any jury that you happened to produce, entirely out of your own creativity, an original work that is identical to a work you had seen before.

> But you very much can rewrite a project under new license even if you have in depth knowledge. IFF you don't have the old project open/look at it while doing so.

No, this is very much false. You will never be able to win a court case on this, as any significant similarity between your work and the original will be considered a copyright violation, per the preponderance of the evidence.

▲

aleph_minus_one 3 hours ago | parent | next [-]

> In contrast, for copyright the right is intrinsically tied to the origin of a work. If you create a digital image that is entirely identical at the pixel level with a copyrighted work, and you can prove that you had never seen that original copyrighted work and you created your image completely independently, then you have not broken anyone's copyright and are free to sell copies of your own work.

This is not true. I will just give the example of the nighttime illumination of the Eiffel Tower:

> https://www.travelandleisure.com/photography/illegal-to-take...

> https://www.headout.com/blog/eiffel-tower-copyright/

▲

simiones 2 hours ago | parent | next [-]

This has no relation to what I was saying. Taking a photo of a copyrighted work is a method for creating a copy of said work using a mechanical device, so it is of course covered by copyright (whether buildings or light shows fall under copyright is an irrelevant detail).

What I'm saying is that if you, say, create an image of a red oval in MS Paint, you have copyright over said image. If 2 years later I create an identical image myself having never seen your image, I also have copyright over my image - despite it being identical to your image, I have every right to sell copies of my image, and even to sue someone who distributes copies of my image without my permission (but not if they're distributing copies of your image).

But if I had seen your image of a red oval before I created mine, it's basically impossible for me to prove that I created my own image out of my own creativity, and I didn't just copy yours. So, if you were to sue me for copyright infringement, I would almost certainly lose in front of any reasonable jury.

▲

chimeracoder 2 hours ago | parent | prev [-]

> This is not true. I will just give the example of the nighttime illumination of the Eiffel Tower:

That example is not analogous to the topic at hand.

But furthermore, it also is specific to French/European copyright law. In the US, the US Copyright Act would not permit restrictions on photographs of architectural works that are visible from public spaces.

▲

jerrysievert an hour ago | parent [-]

actually, the US Copyright Act does in fact allow restrictions on photographs of architectural works that are visible from public spaces:

https://en.wikipedia.org/wiki/Portlandia_(statue)

the Portlandia statue is one such architectural work - and its creator is fairly litigious.

	▲	chimeracoder an hour ago \| parent [-]
		I don't know the details of that specific case so I can't speak to it, but the text of the AWCPA is very clear: > The copyright in an architectural work that has been constructed does not include the right to prevent the making, distributing, or public display of pictures, paintings, photographs, or other pictorial representations of the work, if the building in which the work is embodied is located in or ordinarily visible from a public place. This codifies an already-established principle in US law. French law does not have that same principle.

▲

4 hours ago | parent | prev [-]

[deleted]

▲

twoodfin 4 hours ago | parent | prev | next [-]

If I read Mario Puzo’s The Godfather and then proceed to write a structurally identical novel with many of the same story beats and character types, it will not be difficult to convince a jury exposed to these facts that I’ve created a derivative work.

On the other hand, if I can prove to the jury’s satisfaction that I’ve never been exposed to Puzo’s work in any form, it’s independent creation.

▲

Manuel_D an hour ago | parent | next [-]

To the contrary, there have been many cases of very similar novels with largely identical plot points and settings that survive copyright allegations, even if the author was exposed to the original work.

For a rather entertaining example (though raunchy, for a heads up): https://www.youtube.com/watch?v=zhWWcWtAUoY&themeRefresh=1

▲

helsinkiandrew 4 hours ago | parent | prev [-]

In the case of chardet though it wouldn't it be more like you were the publisher of the godfather novel, withdrawing it from print and releasing a novel with the same name with much of the same plot and characters but claiming the new version was an independent creation?

	▲	pocksuppet an hour ago \| parent [-]
		That's even worse for your case.

▲

helsinkiandrew 5 hours ago | parent | prev | next [-]

If the new maintainers used Claude as their “fancy code generator” (there’s a Claude.md file in the repository so it seems so) then it was almost certainly trained with the chardet source code.

▲

oneeyedpigeon 5 hours ago | parent | prev | next [-]

> And don't just rewrite the same code from memory, but instead write fully new code producing the same/similar outputs.

How different does the new code have to be from the old code and how is that measured?

	▲	larodi 4 hours ago \| parent [-]
		nobody can tell and this is how we entered this very turbulent modern times of "everything can be retold" without punishment. LLMs already doing it at large, while original author is correct in terms of the LGPL, it is nearly impossible to say how different should expression of an idea be to be considered separate one. this is truly fundamental philosophical question that may not have an easy answer.

▲

jmyeet 5 hours ago | parent | prev | next [-]

This is a bad argument.

Think of a rewrite (by a human or an LLM) as a translation. If you wrote a book in English and somebody translated it into Spanish, it'd still be a copyright issue. Same thing with translations.

That's very different to taking the idea of a body of work. So you can't copyright the idea of a pirate taking a princess hostage and a hero rescuing her. That's too generic. But even here there are limits. There have been lawsuits over artistic works being too similar.

Back to software, you can't copyright the idea of photo-editing software but you can copyright the source code that produces that software. If you can somehow prompt an LLM to produce photo editing software or if a person writes it themselves then you have what's generally referred to as a "cleanroom" implmentation and that's copyright-free (although you may have patent issues, which is a whole separate issue).

But even if you prompted an LLM that way, how did the LLM learn what it needed? Was the source code of another project an input in its training? This is a legal grey area, currently. But I suspect it's going to be a problem.

	▲	pera 4 hours ago \| parent [-]
		Suchir Balaji, the OpenAI researcher who was found dead in his flat just before testifying against his employer, published an excellent article somehow related to this topic: When does generative AI qualify for fair use? https://suchir.net/fair_use.html Balaji's argument is very strong and I feel we will see it tested in court as soon as LLM license-washing starts getting more popular.

▲

bsenftner 4 hours ago | parent | prev | next [-]

Hate to be "that guy" but in a corrupt legal system, which ours is, none of this matters. Who has the influence and dollars to make the decision theirs is all that matters.

▲

RcouF1uZ4gsC 5 hours ago | parent | prev [-]

I think you could have an LLM produce a written English detailed description of the complete logic of the program and tests.

Then use another LLM to produce code from that spec.

This would be similar to the cleanroom technique.

▲

simiones 4 hours ago | parent | next [-]

Producing a copy of a copyrighted work through a purely mechanical process is clear violation of copyright. LLMs are absolutely not different from a copier machine in the eyes of the law.

Original works can only be produced by a human being, by definition in copyright law. Any artifact produced by an animal, a mechanical process, a machine, a natural phenomenon etc is either a derived work if it started from an original copyrighted work, or a public domain artifact not covered by copyright law if it didn't.

For example, an image created on a rock struck by lightning is not a copyright covered work. Similarly, an image generated by an diffusion model from a randomly generated sentence is not a copyrightable work. However, if you feed a novel as a prompt to an LLM and ask for a summary, the resulting summary is a derived work of said novel, and it falls under the copyright of the novel's owner - you are not allowed to distribute copies of the summary the LLM generated for you.

Whether the output of an LLM, or the LLM weights themselves, might be considered derived works of the training set of that LLM is a completely different discussion, and one that has not yet been settled in court.

▲

robinsonb5 5 hours ago | parent | prev | next [-]

Perhaps - but an argument might still be made that the result is a derivative work of the original, given that it's produced by feeding the original work through automated tooling.

But either way, deleting the original version from the repo and replacing it with the new version - as opposed to, say, archiving the old version and starting a new repo with the new version - would still be a dick move.

▲

robin_reala 5 hours ago | parent | prev | next [-]

Assuming the second LLM hadn’t been trained on the existing codebase. Which in this case we can’t know, but can assume that it was.

▲

knollimar 5 hours ago | parent | prev [-]

Does the second LLM have the codebase in its training?

	▲	9864247888754 5 hours ago \| parent [-]
		One could use Comma, which has only been trained on public domain texts: https://arxiv.org/pdf/2506.05209