Remix.run Logo
kouteiheika 5 hours ago

> With the one AI, we can do word-to-image to generate an image. Clearly, that is a derived work of the training set of images

> The question of whether AI is stealing material depends exactly on what the training pathway is; what it is that it is learning from the data.

No it isn't. The question of whether AI is stealing material has little to do with the training pathway, but everything to do with scale.

To give a very simple example: is your model a trillion parameter model, but you're training it on 1000 images? It's going to memorize.

Is your model a 3 billion parameter model, but you're training it on trillions of images? It's going to generalize because it simply doesn't physically have the capacity to memorize its training data, and assuming you've deduplicated your training dataset it's not going to memorize any single image.

It literally makes no difference whether you'll use the "trained on the same scene but one in daylight and one at night" or "generate the image based on a description" training objective here. Depending on how you pick your hyperparameters you can trivially make either one memorize the training data (i.e. in your words "make it clearly a derived work of the training set of images").