Until prompt injection is fixed, if it is ever, I am not plugging LLMs into anything. MCPs, IDEs, agents, forget it. I will stick with a simple prompt box when I have a question and do whatever with its output by hand after reading it.

▲

danpalmer 5 months ago | parent | next [-]

Prompt injection is unlikely to be fixed. I'd stop thinking about LLMs as software where you can with enough effort just fix a SQL injection vulnerability, and start thinking about them like you'd think about insider risk from employees.

That's not to say that they are employees or perform at that level, they don't, but it's to say that LLM behaviours are fuzzy and ill-defined, like humans. You can't guarantee that your users won't click on a phishing email – you can train them, you can minimise risk, but ultimately you have to have a range of solutions applied together and some amount of trust. If we think about LLMs this way I think the conversation around security will be much more productive.

▲

LegionMammal978 5 months ago | parent [-]

The thing that I'd worry about is that an LLM isn't just like a bunch of individuals who can get tricked, but a bunch of clones of the same individual who will fall for the same trick every time, until it gets updated. So far, the main mitigation in practice has been fiddling with the system prompts to patch up the known holes.

▲

thaumasiotes 5 months ago | parent [-]

> The thing that I'd worry about is that an LLM isn't just like a bunch of individuals who can get tricked, but a bunch of clones of the same individual who will fall for the same trick every time

Why? Output isn't deterministic.

▲

LegionMammal978 5 months ago | parent | next [-]

Perhaps not, but the same input will lead to the same distribution of outputs, so all an attacker has to do is design something that works with reasonable probability on their end, and everyone else's instances of the LLM will automatically be vulnerable. The same way a pest or disease can devastate a population of cloned plants, even if each one grows slightly differently.

▲

thaumasiotes 5 months ago | parent [-]

OK, but that's also the way attacking a bunch of individuals who can get tricked works.

▲

zwnow 5 months ago | parent [-]

For tricking individuals your first got to contact them somehow. To trick an LLM you can just spam prompts.

▲

thaumasiotes 5 months ago | parent [-]

You email them. It's called phishing.

▲

throwaway314155 5 months ago | parent | next [-]

Right and now there's a new vector for an old concept.

▲

zwnow 5 months ago | parent | prev [-]

Employees usually know to not click on random shit they get sent. Most mails alrdy get filtered before they even reach the employee. Good luck actually achieving something with phishing mails.

▲

thaumasiotes 5 months ago | parent [-]

When I was at NCC Group, we had a policy about phishing in penetration tests.

The policy was "we'll do it if the customer asks for it, but we don't recommend it, because the success rate is 100%".

	▲	bluefirebrand 5 months ago \| parent [-]
		How can you ever get that lower than 100% if you don't do the test to identify which employees need to be trained / monitored because they fall for phishing?

▲

Retr0id 5 months ago | parent | prev | next [-]

You can still experimentally determine a strategy that works x% of the time, against a particular model. And you can keep refining it "offline" until x=99. (where "offline" just means invisible to the victim, not necessarily a local model)

▲

33hsiidhkl 5 months ago | parent | prev [-]

It absolutely is deterministic, for any given seed value. Same seed = same output, every time, which is by definition deterministic.

▲

tough 5 months ago | parent [-]

only if temperature is 0, but are they truly determinstic? I thought transformer based llm's where not

▲

33hsiidhkl 5 months ago | parent [-]

temperature does not affect token prediction in the way you think. The seed value is still the seed value, before temperature calculations are performed. The randomness of an LLM is not related to its temperature. The seed value is what determines the output. For a specific seed value, say 42069, the LLM will always generate the same output, given the same input, given the same temperature.

	▲	tough 5 months ago \| parent [-]
		Thank you, I thought this wasn't the case (like it is with diffusion image models) TIL

▲

TechDebtDevin 5 months ago | parent | prev | next [-]

Cursor deleted my entire Linux user and soft reset my OS, so I dont blame you.

▲

sunnybeetroot 5 months ago | parent | next [-]

Cursor by default asks to execute commands, sounds like you had auto run commands on…

▲

raphman 5 months ago | parent | prev [-]

Why and how?

▲

tough 5 months ago | parent | next [-]

an agent does rm -rf /

i think i saw it do it or try it and my computer shut down and restarted (mac)

maybe it just deleted the project lol

these llms are really bad at keeping track of the real world, so they might think they're on the project folder but had just navigated back with cd to the user ~ root and so shit happens.

Honestly one should run only these on controlled env's like VM's or Docker.

but YOLO amirite

▲

margalabargala 5 months ago | parent | next [-]

That people allow these agents to just run arbitrary commands against their primary install is wild.

Part of this is the tool's fault. Anything like that should be done in a chroot.

Anything less is basically "twitch plays terminal" on your machine.

	▲	serf 5 months ago \| parent \| next [-]
		a large part of the benefit to an agentic ai is that it can coordinate tests that it automatically wrote on an existing code base, a lot of time the only way to get decent answers out of something like that is to let it run as bare metal as it can. I run cursor and the accompanying agents in a snapshot'd VM for this purpose. It's not much different than what you suggest, but the layer of abstraction is far enough for admin-privileged app testing, an unfortunate reality for certain personal projects. I haven't had a cursor install nuke itself yet, but I have had one fiddling in a parent folder it shouldn't have been able to with workspace protection on..
	▲	tough 5 months ago \| parent \| prev [-]
		codex at least has limitations on what folders can operate.

▲

TechDebtDevin 5 months ago | parent | prev [-]

This is what happened. I was testing claude 4 and asked it to create a simple 1K LOC fyne android app. I have my repos stored outside of my linux user so the work it created was preserved. It essentially created a bash file that cd ~ && rm -rf / . All settings reset and documents/downloads disappeared lmfao. I don't ever really use my OS as primary storage, and any config or file of importance is backed up twice so it wasn't a big deal, but it was quite perplexing for a sec.

	▲	tough 5 months ago \| parent [-]
		if you think deeply about it, its one kind of harakiri as an AI to remove the whole system you're operating on. Yeah Claude 4 can go too far some times

▲

TechDebtDevin 5 months ago | parent | prev [-]

rm -rf /

▲

M4v3R 5 months ago | parent | prev | next [-]

DeepMind recently did some great work in this area: https://news.ycombinator.com/item?id=43733683

The method they presented, if implemented correctly, apparently can effectively stop most prompt injection vectors

	▲	5 months ago \| parent [-]
		[deleted]

▲

johnisgood 5 months ago | parent | prev | next [-]

I keep it manual, too, and I think I am better off for doing so.

▲

hu3 5 months ago | parent | prev [-]

I would have the same caution, if my code was any special.

But the reality is I'm very well compensated to summon CRUD slop out of thin air. It's well tested though.

I wish good luck to those who steal my code.

▲

mdaniel 5 months ago | parent [-]

You say code as if the intellectual property is the thing an attacker is after, but my experience has been that folks often put all kinds of secrets in code thinking that the "private repo" is a strong enough security boundary

I absolutely am not implying you are one of them, merely that the risk is not the same for all slop crud apps universally

▲

tough 5 months ago | parent [-]

People doesn't know github can manage secrets in its environment for CI?

Antoher interesting fact is that most big vendors pay for gh to scan for leaked secrets and auto-revoke them if a public repo contains any (regex string matches sk-xxx <- its a stripe key

thats one of the reasons why vendors use unique greppable starts of api keys with their ID.name on it

▲

mdaniel 5 months ago | parent [-]

You're mistaking "know" with "care," since my experience has been that people know way more than they care

And I'm pretty certain that private repos are exempt from the platform's built-in secret scanners because they, too, erroneously think no one can read them without an invitation. Turns out Duo was apparently just silently invited to every repo : - \

	▲	tough 5 months ago \| parent [-]
		I also remember reading about how due to how the git backend works your private git repos branches could get exposed to the public, so yea don't treat a repository as a private password mananger good point the scanner doesnt work on private repos =(