| ▲ | pastor_williams 4 hours ago | ||||||||||||||||||||||||||||||||||
It isn't just out of the kindness of their hearts that they don't do this. There are laws and regulations. There is also legal risk and reputation. I have to go through a legal and privacy process at my big corp job whenever I want to record a new timestamp and I need to ensure that the data is used appropriately and that it is wiped later. I've only seen these compliance requirements become more onerous over the past ten years and I expect that to continue. | |||||||||||||||||||||||||||||||||||
| ▲ | Terr_ 3 hours ago | parent [-] | ||||||||||||||||||||||||||||||||||
> There are laws and regulations. There is also legal risk and reputation. One of the big companies, Meta, already decided to go ahead and grab terabytes of pirated books to feed their LLM. [0] Therefore I would not give them (or similar entities) the benefit of the doubt when it comes to how they might use text that customers "gave" them under some unreadably-favorable terms of service. With PII, the pirated-books example is doubly-relevant, because the accusation of "this output is reproducing my copyright work" is very similar to "this output is revealing my private data". The fuzzy black-box nature of the algorithms offers ways to stymie enforcement, arguing that victims or regulators cannot conclusively prove a chain of cause with zero coincidences. [0] https://www.theatlantic.com/technology/archive/2025/03/libge... | |||||||||||||||||||||||||||||||||||
| |||||||||||||||||||||||||||||||||||