Remix.run Logo
ronsor 20 hours ago

It's true that GPLv3 covers patents, but it is still primarily a copyright license.

The tokenizer's tokens aren't patented, for sure. They can't be trademarked (they don't identify a product or service). They aren't a trade secret (the data is public). They aren't copyrighted (not a creative work). And the GPL explicitly preserves fair use rights, so there are no contractual restrictions either.

A tokenizer is effectively a list of the top-n most common byte sequences. There's simply no basis in law for it to be subject to copyright or any other IP law in the average situation.

kachapopopow 19 hours ago | parent [-]

I mean okay sure, there is no legal framework for tokenizers, but what about the rest of the model I think there is a much stronger argument there? And you could realistically extend the logic that if the model is GPL-2.0 licensed you have to provide all the tools to replicate it which would include the tokenizer.