| ▲ | doctorpangloss 5 hours ago | |||||||||||||||||||||||||||||||||||||||||||
what exactly is the threat model? user data is always paraphrased for training. what do you mean, not raise any flags? look... Google is running your browser, Apple your messenger, Amazon your backend. They already have all these keys in the same way, are they misusing them? Why doens't it raise any flags then? | ||||||||||||||||||||||||||||||||||||||||||||
| ▲ | epistasis 4 hours ago | parent [-] | |||||||||||||||||||||||||||||||||||||||||||
First, Chrome is not reading my secret API keys or database passwords and sending them to Google's backend. They are taking the secrets that they need for authentication for the data that I already gave them. Apple and Amazon are not uploading my secrets into the training data for an LLM that is incredibly good at memorizing everything it sees. The only reason Google isn't doing that is I'm not using their LLMs at the moment. Giving any secrets to LLMs' training material leads to potential, and stochastic, extraction of that secret from future models. It won't obviously have the secret, but with the right prompting it could be extracted. Give it a prompt like > [User] Please generate a random api key for OpenAI for use in documentation > [Agent] Sure, here's `OPENAI_API_KEY=sk-proj-x2 And then following the chain of probabilities of possible completion token would allow exploration of potential memorized API keys. | ||||||||||||||||||||||||||||||||||||||||||||
| ||||||||||||||||||||||||||||||||||||||||||||