| ▲ | progval 2 hours ago | |
> There are research models out there which are trained on only permissively licensed data Models whose authors tried to train only on permissively licensed data. For example https://huggingface.co/bigcode/starcoder2-15b tried to be a permissively licensed dataset, but it filtered only on repository-level license, not file-level. So when searching for "under the terms of the GNU General Public License" on https://huggingface.co/spaces/bigcode/search-v2 back when it was working, you would find it was trained on many files with a GPL header. | ||