▲ | alganet 2 days ago | ||||||||||||||||||||||
It doesn't say anything about open training corpus of data. The USA supposedly have the most data in the world. Companies cannot (in theory) train on integrated sets of information. USA and China to some extent, can train on large amounts of information that is not public. USA in particular has been known for keeping a vast repository of metadata (data about data) about all sorts of things. This data is very refined and organized (PRISM, etc). This allows training for purposes that might not be obvious when observing the open weights or the source of the inference engine. It is a double-edged sword though. If anyone is able to identify such non-obvious training inserts and extract information about them or prove they were maliciously placed, it could backfire tremendously. | |||||||||||||||||||||||
▲ | vharuck 2 days ago | parent [-] | ||||||||||||||||||||||
So DOGE might not be consolidating and linking data just for ICE, but for providing to companies as a training corpus? In normal times, I'd laugh that off as a paranoiac fever dream. | |||||||||||||||||||||||
|