| ▲ | yjftsjthsd-h 2 days ago | ||||||||||||||||||||||||||||||||||
> I care that I know what I can DO with the project when I see it described as "open source". Yes, the first of which is that you should be able to build it from source. Which requires the source code, and in this case data. | |||||||||||||||||||||||||||||||||||
| ▲ | simonw 2 days ago | parent | next [-] | ||||||||||||||||||||||||||||||||||
The OSI's take on this is that an open source model can be modified through fine-tuning etc, even if you can't rebuild it from scratch. The problem with requiring "build from scratch" for open source models is that the number of interesting models with training data that can be openly licensed is close to zero. If you trained your model on an unlicensed scrape of the web you can't release the data under an open source license! The Open Source Initiative have a bunch of their thinking around this in their FAQ for the "Open Source AI definition": https://opensource.org/ai/faq#isn-t-training-data-required-t... | |||||||||||||||||||||||||||||||||||
| |||||||||||||||||||||||||||||||||||
| ▲ | rogerrogerr 2 days ago | parent | prev [-] | ||||||||||||||||||||||||||||||||||
They’ll never reveal the data, because that would reveal this is all built on stolen work. | |||||||||||||||||||||||||||||||||||
| |||||||||||||||||||||||||||||||||||