Remix.run Logo
benob 3 hours ago

I don't think this would qualify as clean room (the Library was involved in learning to generate programs as a whole). However, it should be possible to remove the library from the OLMO training data and retrain it from scratch.

But what about training without having seen any human written program? Coul a model learn from randomly generated programs?

foota 2 hours ago | parent [-]

> I don't think this would qualify as clean room (the Library was involved in learning to generate programs as a whole)

Hm... I mean this is really one for the lawyers, but IMO you would likely successfully be able to argue that the marginal knowledge of general coding from a particular library is likely close to nil.

The hard part here imo would be convincingly arguing that you can wipe out knowledge of the library from the training set, whether through fine tuning or trying to exclude it from the dataset.

> But what about training without having seen any human written program? Coul a model learn from randomly generated programs?

I think the answer at this point is definitely no, but maybe someday. I think it's a more interesting question for art since it's more subjective, if we eventually get to a point where a machine can self-teach itself art from nothing... first of all how, but second of all it would be interesting to see the reaction from people opposed to AI art on the basis of it training off of artists.

Honestly given all I've seen models do, I wouldn't be too surprised if you could somehow distill a (very bad) image generation model off of just an LLM. In a sense this is the end goal of the pelican riding a bicycle (somewhat tongue in cheek), if the LLM can learn to draw anything with SVGs without ever getting visual inputs then it would be very interesting :)