It doesn't say anything about open training corpus of data.

The USA supposedly have the most data in the world. Companies cannot (in theory) train on integrated sets of information. USA and China to some extent, can train on large amounts of information that is not public. USA in particular has been known for keeping a vast repository of metadata (data about data) about all sorts of things. This data is very refined and organized (PRISM, etc).

This allows training for purposes that might not be obvious when observing the open weights or the source of the inference engine.

It is a double-edged sword though. If anyone is able to identify such non-obvious training inserts and extract information about them or prove they were maliciously placed, it could backfire tremendously.

▲

vharuck 2 days ago | parent [-]

So DOGE might not be consolidating and linking data just for ICE, but for providing to companies as a training corpus? In normal times, I'd laugh that off as a paranoiac fever dream.

▲

dudeinjapan 2 days ago | parent | next [-]

If AI were trained on troves of personal info like SSNs, emails, phones then the leakage would be easily discovered and the model would be worthless for any commercial/mass-consumption purpose. (This doesnt rule out a PRISM-AI for NSA purposes of course.)

	▲	alganet a day ago \| parent [-]
		The way you describe it make PRISM sound like a contact book. I think it more like unwilling facebook.

▲

alganet a day ago | parent | prev [-]

Companies can change hands easier than governments. I would assume the US isn't sharing anything exclusive with private commercial entities. Doing so would be a mistake in my opinion.