Remix clone Hacker News

new | show | ask | jobs Github

	▲	_pdp_ 2 hours ago
		On EU data sovereignty: The OP is right. For that reason we started migrating all of our cloud-based services out of USA into EU data centers with EU companies behind them. We are basically 80% there. The last 20% remaining are not the difficult ones - they are just not really that important to care that much at this point but the long terms intention is a 100% disconnect. On IDV security: When you send your document to an IDV company (be that in USA or elsewhere) they do not have the automatic right to train on your data without explicit consent. They have been a few pretty big class action lawsuits in the past around this but I also believe that the legal frameworks are simply not strong enough to deter abuse or negligence. That being said, everyone reading this must realise that with large datasets it is practically very likely to miss-label data and it is hard to prove that this is not happening at scale. At the end of the day it will be a query running against a database and with huge volumes it might catch more than it should. Once the data is selected for training and trained on, it is impossible to undo the damage. You can delete the training artefact after the fact of course but the weights of the models are already re-balanced with the said data unless you train from scratch which nobody does. I think everyone should assume that their data, be that source code, biometrics, or whatever, is already used for training without consent and we don't have the legal frameworks to protect you against such actions - in fact we have the opposite. The only control you have is not to participate.