I wonder about this training data. There's so much profit from open source code in training data, actually the most of the code it was taught was open source, shouldn't it be then free? Or at least open weight?