How do you prove the training data didn't contain the code?
I'd assume an LLM trained on the original would also be contaminated.