I completely share your sentiment about feeling irked about open source code being used to train commercial AI models. However, I think the battle is already lost - the nature of copyright and open source code philosophy (currently) means that there isn't any real way of preventing your code being used to train AI. Look at the legal precedents being set in courts where many of the BigTechs have actually pirated copyrighted media to train their AI, and the court has said "that's acceptable". (Ofcourse, the actual act of piracy - like Facebook did by downloading copyrighted material through torrents - may not be legal, but the courts may be lenient here too as there seems to be an undercurrent of government approval to do anything to win the "AI Race").

And, even if you move your repository somewhere else, can you really prevent anyone from uploading it to Github? To do so, you may have to create your open source license.

▲

lelanthran an hour ago | parent [-]

> However, I think the battle is already lost - the nature of copyright and open source code philosophy (currently) means that there isn't any real way of preventing your code being used to train AI.

Laws should make it a double-edged sword, make distillation explicitly legal.

Not much else they can do.

	▲	overfeed 38 minutes ago \| parent [-]
		> Laws should make it a double-edged sword, make distillation explicitly legal. Knowledge-distillation is already legal. Current case law says the none of outputs of any model is protected by copyright, so one could use it for whatever they want - including distillation. That is why the AI companies resort to ToS clauses to block distillation and/or training competing models.