▲ | echelon 4 days ago | |
Too bad. The OSI owns "open source". Big tech has been abusing open source to cheaply capture most of the internet and e-commerce anyway, so perhaps it's time we walked away from the term altogether. The OSI has abdicated the future of open machine learning. And that's fine. We don't need them. "Free software" is still a thing and it means a very specific and narrow set of criteria. [1, 2] There's also "Fair software" [3], which walks the line between CC BY-NC-SA and shareware, but also sticks it to big tech by preventing Redis/Elasticsearch capture by the hyperscalers. There's an open game engine [4] that has a pretty nice "Apache + NC" type license. --- Back on the main topic of "open machine learning": since the OSI fucked up, I came up with a ten point scale here [5] defining open AI models. It's just a draft, but if other people agree with the idea, I'll publish a website about it (so I'd appreciate your feedback!) There are ten measures by which a model can/should be open: 1. The model code (pytorch, whatever) 2. The pre-training code 3. The fine-tuning code (which might be very different from the pre-training code) 4. The inference code 5. The raw training data (pre-training + fine-tuning) 6. The processed training data (which might vary across various stages of pre-training and fine-tuning: different sizes, features, batches, etc.) 7. The resultant weights blob(s) 8. The inference inputs and outputs (which also need a license; see also usage limits like O-RAIL) 9. The research paper(s) (hopefully the model is also described and characterized in the literature!) 10. The patents (or lack thereof) A good open model will have nearly all of these made available. A fake "open" model might only give you two of ten. --- [2] https://en.wikipedia.org/wiki/Free_software [3] https://fair.io/ |