Remix.run Logo
gortok 4 hours ago

There are two issues I see here (besides the obvious “Why do we even let this happen in the first place?”):

1. What happened to all the data Copilot trained on that was confidential? How is that data separated and deleted from the model’s training? How can we be sure it’s gone?

2. This issue was found; unfortunately without a much better security posture from Microsoft, we have no way of knowing what issues are currently lurking that are as bad as —- if not worse than —- what happened here.

There’s a serious fundamental flaw in the thinking and misguided incentives that led to “sprinkle AI everywhere”, and instead of taking a step back and rethinking that approach, we’re going to get pieced together fixes and still be left with the foundational problem that everyone’s data is just one prompt injection away from being taken; whether it’s labeled as “secure” or not.

carefulfungi 3 hours ago | parent | next [-]

> "The Microsoft 365 Copilot 'work tab' Chat is summarizing email messages even though these email messages have a sensitivity label applied and a DLP policy is configured."

I'd add (3) - a DLP policy is apparently ineffective at its purpose: monitoring data sharing between machines. (https://learn.microsoft.com/en-us/purview/dlp-learn-about-dl...).

Directly from the DLP feature page:

> DLP, with collection policies, monitors and protects against oversharing to Unmanaged cloud apps by targeting data transmitted on your network and in Microsoft Edge for Business. Create policies that target Inline web traffic (preview) and Network activity (preview) to cover locations like:

> OpenAI ChatGPT—for Edge for Business and Network options > Google Gemini—for Edge for Business and Network options > DeepSeek—for Edge for Business and Network options > Microsoft Copilot—for Edge for Business and Network options > Over 34,000 cloud apps in the Microsoft Defender for Cloud Apps cloud app catalog—Network option only

caminante an hour ago | parent [-]

> a DLP policy is apparently ineffective at its purpose

/Offtopic

Yes, MSFT's DLP/software malfunctioned, but getting users to MANUALLY classify things as confidential is already an uphill battle. These are for the rare subset of people that are aware of and compliant with NDAs/Confidentiality Agreements!

ImPostingOnHN an hour ago | parent [-]

Who can blame them, when in the end, it gets ignored anyways?

doctorpangloss 4 hours ago | parent | prev [-]

All the vendors paraphrase user data, then use the paraphrased data for training. This is what their terms of service say.

They have significant experience in this. Microsoft software since the 2014, for the most part, is also paraphrased from other people's code they find laying around online.

benterix 3 hours ago | parent | next [-]

> All the vendors paraphrase user data, then use the paraphrased data for training. This is what their terms of service say.

It depends. E.g. OpenAI says: "By default, we do not train on any inputs or outputs from our products for business users, including ChatGPT Team, ChatGPT Enterprise, and the API."[0]

[0] https://openai.com/policies/how-your-data-is-used-to-improve...

moritzwarhier an hour ago | parent | prev [-]

> Microsoft software since the 2014, for the most part, is also paraphrased from other people's code they find laying around online.

That was pretty funny and explains a lot.

I wish I could do more :(

Instead I always break things when I paraphrase code without the GeniusParaphrasingTool

nyrikki 26 minutes ago | parent [-]

This is exactly why I moved to self hosted code in 2017.

While I couldn’t have predicted the future, even classic data mining posed a risk.

It is just reality that if you give a third party access to your data, you should expect them to use it.

It is just too tempting of a value stream and legislation just isn’t there to avoid the EULA trap.

I was targeting a market where fractions of a percentage advantage were important which did drive my what at the time was labeled paranoia