That might be somewhat ungenerous unless you have more detail to provide.

I know that at least some LLM products explicitly check output for similarity to training data to prevent direct reproduction.

▲

TZubiri an hour ago | parent | next [-]

So it would be able to produce the training data but with sufficient changes or added magic dust to be able to claim it as one's own.

Legally I think it works, but evidence in a court works differently than in science. It's the same word but don't let that confuse you and don't mix them both.

▲

guenthert 6 hours ago | parent | prev [-]

Should they though? If the answer to a question^Wprompt happens to be in the training set, wouldn't it be disingenuous to not provide that?

▲

ttctciyf 5 hours ago | parent [-]

Maybe it's intended to avoid legal liability resulting from reproducing copyright material not licensed for training?

	▲	TZubiri an hour ago \| parent [-]
		Ding! It's great business to minimally modify valuable stuff and then take credit for it. As was explained to me by bar-certified counsel "if you take a recipe and add, remove or change just one thing, it's now your recipe" The new trend in this is asking Claude Code to create a software on some type, like a Browser or a DICOM viewer, and then publishing that it's managed to do this very expensive thing (but if you check source code, which is never published, it probably imports a lot of open source dependencies that actually do the thing) Now this is especially useful in business, but it seems that some people are repurposing this for proving math theorems. The Terence Tao effort which later checks for previous material is great! But the fact that the Section 2 (for such cases) is filled to the brim, and section 1 is mostly documented failed attempts (except for 1 proof, congratulations to the authors), mostly confirms my hypothesis, claiming that the model has guards that prevent it is a deus ex machina cope against the evidence.