Remix.run Logo
gwern 2 hours ago

The solution here seems to be to impose some constraint or requirement which means that literal copying is impossible (remember, copyright governs copies, it doesn't govern ideas or algorithms - that would be 'patents', which essentially no open source software has) or where any 'copying' from vaguely remembered pretraining code is on such an abstract indirect level that it is 'transformative' and thus safe.

For example, the Anthropic Rust C compiler could hardly have copied GCC or any of the many C compilers it surely trained on, because then it wouldn't have spat out reasonably idiomatic and natural looking Rust in a differently organized codebase.

Good news for Rust and Lean, I guess, as it seems like everyone these days is looking for an excuse to rewrite everything into those for either speed or safety or both.

pron 2 hours ago | parent [-]

> copyright governs copies, it doesn't govern ideas or algorithms

The second part is true. The first is a little trickier. The copyright applies to some fixed media (text in this case) rather than the idea expressed, but the protections extend well beyond copies. For example, in fiction, the narrative arc and "arrangement" is also protected, as are adaptations and translations.

If you were to try and write The Catcher in the Rye in Italian completely from memory (however well you remember it) I believe that would be protected by copyright even if not a single sentence were copied verbatim.