Remix.run Logo
injidup 2 days ago

You got it the wrong way round. It's more akin to.

1. Training data is the source. 2. Training is compilation/compression. 3. Weights are the compiled source akin to optimized assembly.

However it's an imperfect analogy on so many levels. Nitpick away.

mirekrusin 2 days ago | parent [-]

It's dataset [0] released under some source available license or OSI license, ie. open dataset or open source dataset.

[0] https://news.ycombinator.com/item?id=47758408

necovek a day ago | parent [-]

So is it open dataset or open source dataset?

Eg. it is no accident Creative Commons is using different terminology for non-software works.

mirekrusin 21 hours ago | parent [-]

"Open Source" is normally reserved for OSI approved licenses but there are many non-OSI approved, source available licenses as well.

For example gemma4 is released under Apache 2.0 license – and can be called open source dataset.

On the other hand ie. deepseek, while publicly available weights model, is not released under OSI approved license, they released it under their own "Deepseek License Aggreement" – ie. in general it's free to use as normal OSI license but has some restrictions, ie. military use is explicitly forbidden.