Superhuman AI Exfiltrates Emails

observationist 3 hours ago | parent | next [-]

As limited as they are, LLMs are demonstrably smarter than a whole lot of people, and the number of people more clever than the best AI is going to dwindle, rapidly, especially in the domain of doing sneaky shit really fast on a computer.

There are countless examples of schemes in stories where codes and cryptography are used to exfiltrate information and evade detection, and these models are trained on every last piece of technical, practical text humanity has produced on the subject. All they have to do is contextualize what's likely being done to check and mash together two or three systems it thinks is likely to go under the radar.

▲

0xferruccio a day ago | parent | prev | next [-]

The primary exfiltration vector for LLMs is making network requests via images with sensitive data as parameters.

As Claude Code increasingly uses browser tools, we may need to move away from .env files to something encrypted, kind of like rails credentials, but without the secret key in the .env

▲

SahAssar a day ago | parent | next [-]

So you are going to take the untrusted tool that kept leaking your secrets, keep the secrets away from it but still use it to code the thing that uses the secrets? Are you actually reviewing the code it produces? In 99% of cases that's a "no" or a soft "sometimes".

▲

TeMPOraL 3 hours ago | parent [-]

That's exactly what one does with their employees when one deploys "credential vaults", so?

▲

SahAssar 3 hours ago | parent [-]

Employees are under contract and are screened for basic competence. LLMs aren't and can't be.

▲

TeMPOraL 3 hours ago | parent [-]

> Employees are under contract and are screened for basic competence. LLMs aren't

So perhaps they should be.

> and can't be.

Ah but they must, because there's not much else you can do.

You can't secure LLMs like they were just regular, narrow-purpose software, because they aren't. They're by nature more like little people on a chip (this is an explicit design goal) - and need to be treated accordingly.

▲

majormajor 24 minutes ago | parent | next [-]

Sooo the primary way we enforce contracts and laws against people are things like fines and jail time.

How would you apply the threat of those to "little people on a chip", exactly?

Imagine if any time you hired someone there was a risk that they'd try to steal everything they could from your company and then disappear forever with you having no way to hold them to account? You'd probably stop hiring people you didn't already deeply trust!

Strict liability for LLM service providers? Well, that's gonna be a non-starter unless there's a lot of MAJOR issues caused by LLMs (look at how little we care about identity theft and financial fraud currently).

▲

SahAssar 3 hours ago | parent | prev [-]

> So perhaps they should be.

Unless both the legalities and technology radically change they will not be. And the companies building them will not take on the burden since the technology has proved to be so unpredictable (partially by design) and unsafe.

> designed to be more like little people on a chip - and need to be treated accordingly

Deeply unpredictable and unsafe people on a chip, so not the sort that I generally want to trust secrets with.

I don't think it's that complex, you can have secure systems or you can have current gen LLMs. You can't have both in the same place.

▲

TeMPOraL 3 hours ago | parent [-]

> Deeply unpredictable and unsafe people on a chip, so not the sort that I generally want to trust secrets with.

Very true when comparing to acquaintances, but at a scale of any company or system except the tiniest ones, you can't blindly trust people in general either. Building systems involving people and LLMs is pretty similar.

> I don't think it's that complex, you can have secure systems or you can have current gen LLMs. You can't have both in the same place.

That is, indeed, the key. My point is that, unlike the popular opinion in threads like this, it does not follow that we need to give up on LLMs, or that we need to fix the security issues. The former is undesirable, the latter is fundamentally impossible.

What we need is what we've been doing ever since civilization took shape, ever since we've started building machines: recognize that automatons and people are different kinds of components, with different reliability and security characteristics. You can't blindly substitute one for the other, but there are ways to make them work together. Most systems we've created are of that nature.

What people still get wrong is treating LLMs as "automatons" components. They're not, they're "people" components.

	▲	SahAssar 2 hours ago \| parent [-]
		I think I generally agree, but I also think that treating them like people means that you expect reason, intelligence and a way to interrogate their way of "thinking" (very broad quotes here). I think LLMs are to be treated as something completely separate from both predictable machines ("automatons") and people. They have separate concerns and fitness for a use-case than both existing categories.

▲

xyzzy123 16 hours ago | parent | prev [-]

One tactic I've seen used in various situations is proxies outside the sandbox that augment requests with credentials / secrets etc.

Doesn't help in the case where the LLM is processing actually sensitive data, ofc.

▲

ineedasername 3 hours ago | parent | prev | next [-]

Why does an agent tasked with email summarizing have access to anything else? There’s plenty of difference between an agent and a background service or daemon but it’s at minimum got to be given the same restrictions in scope they would be, or an intern using your system for the same purpose. Developers need to bring the same ZTA mindset to agent permissions they would to building the other services and infrastructure they rely on.

	▲	rapind 2 hours ago \| parent [-]
		“Move fast and break things.” It’s funny you even need to ask on hacker news of all places. ;)

▲

sarelta a day ago | parent | prev | next [-]

I'm impressed Superhuman seems to have handled this so well - lots of big names are fumbling with AI vuln disclosures. Grammarly is not necessarily who I would have bet on to get it right

▲

empiko 18 hours ago | parent | next [-]

I wonder how they handled it. Everybody's connecfing their AI to the Web, but it automatically means that any data AI has access to can be extracted by the attacker. The only safe way forward is to 1. disconnect the Web or 2. perhaps to filter the generated URLs aggressively.

▲

ttoinou 17 hours ago | parent | next [-]

We should have a clearer view of permissions of the AI, operations it does, and have one button per day to accept/deny operations from given data. Instead of auto approval.

▲

wat10000 4 hours ago | parent | prev [-]

Private data, untrusted data, communication: an LLM can safely have two of these, but never all three.

Browsing the web is both communication and untrusted data, so it must never have access to any trusted data if it has the ability to browse the web.

The problem is, so much of what people want from these things involves having all three.

	▲	TeMPOraL 3 hours ago \| parent [-]
		> The problem is, so much of what people want from these things involves having all three. Pretty much. Also there's no way of "securing" LLMs without destroying the quality that makes them interesting and useful in the first place. I'm putting "securing" in scare quotes because IMO it's fool's errand to even try - LLMs are fundamentally not securable like regular, narrow-purpose software, and should not be treated as such.

▲

djaouen an hour ago | parent | prev [-]

Are you f*cking kidding me? Grammarly is like the best one!

▲

djaouen a day ago | parent | prev [-]

Programming used to prevent this by separating code from data. AI (currently) has no such safeguards.

▲

TeMPOraL 3 hours ago | parent [-]

Reality doesn't have a distinction between "code" and "data"; those are categories of convenience, and don't even have a proper definition (what is code and what is data depends on who's asking and why). Any such distinction requires mechanically enforcing it; AI won't have it, because it's not natural, and adding it destroys generality of the model.

▲

djaouen 2 hours ago | parent [-]

OK, then sequence your DNA and send it to me. I will make sure to use it as code!

▲

TeMPOraL an hour ago | parent [-]

Haha. But DNA is a very good example of what I'm talking about. It's both "code" and "data" at the same time - or rather, a perfect demonstration that these concepts don't exist in nature.

▲

djaouen an hour ago | parent [-]

Yes, but for me to use your DNA as code would be a major malfunction!

	▲	TeMPOraL 37 minutes ago \| parent [-]
		I get the joke, but it's also an incredibly interesting topic to ponder. Remember "Reflections on Trusting Trust"? Now consider that DNA itself needs a complex biomolecular machine to "compile" it into cells and organisms, and that this also embeds in them copies of the "compiler" itself. This raises the question of whether, and how much, information needed to build the organism is not explicitly encoded anywhere in the DNA itself, and instead accumulates in the replication mechanism and gets carried over implicitly. So for you to successfully use my DNA as code, without also borrowing the compiler from my body, would be a major scientific result, shining light on the questions outlined above. So in short: I'm happy to contribute my DNA if you cite me as co-author on the resulting paper :P.