Remix.run Logo
kachapopopow 21 hours ago

GPL however, does put restrictions on it, even the tokenizer. It was specifically crafted in a way where even if you do not have any GPL licensed sourcecode in your project, but it was built on top of it you are still binded by GPL limitations.

the only reason usermode is not affected is because they have an exclusion for it and only via defined communication protocol, if you go around it or attempt to put a workaround in the kernel guess what: it still violates the license - point is: it is very restrictive.

ronsor 21 hours ago | parent [-]

> GPL however, does put restrictions on it, even the tokenizer. It was specifically crafted in a way where even if you do not have any GPL licensed sourcecode in your project, but it was built on top of it you are still binded by GPL limitations.

This is not how copyright law works. The GPL is a copyright license, as stated by the FSF. Something which is not subject to copyright cannot be subject to a copyright license.

kachapopopow 20 hours ago | parent [-]

GPL is not only a copyright license, it also covers multiple types of intellectual property rights. Especially when you consider GPL-3 which has explicit IP protection while GPL-2 is implicit, so yah you're partially right for GPL-2 and wrong for GPL-3.

ronsor 20 hours ago | parent [-]

It's true that GPLv3 covers patents, but it is still primarily a copyright license.

The tokenizer's tokens aren't patented, for sure. They can't be trademarked (they don't identify a product or service). They aren't a trade secret (the data is public). They aren't copyrighted (not a creative work). And the GPL explicitly preserves fair use rights, so there are no contractual restrictions either.

A tokenizer is effectively a list of the top-n most common byte sequences. There's simply no basis in law for it to be subject to copyright or any other IP law in the average situation.

kachapopopow 19 hours ago | parent [-]

I mean okay sure, there is no legal framework for tokenizers, but what about the rest of the model I think there is a much stronger argument there? And you could realistically extend the logic that if the model is GPL-2.0 licensed you have to provide all the tools to replicate it which would include the tokenizer.