Wild to see this! This is crazy.

Hopefully the LLM vendors issue security statements shortly. If they don't, that'll be pretty damning.

This ought to be a SEV0 over at Google and Anthropic.

> Hopefully the LLM vendors issue security statements shortly. If they don't, that'll be pretty damning.

Why would it be damning? Their products are no more culpable than Git or the filesystem. It's a piece of software installed on the computer whose job is to do what it's told to do. I wouldn't expect it to know that this particular prompt is malicious.

▲

CER10TY 6 days ago | parent | next [-]

Personally, I'd expect Claude Code not to have such far-reaching access across my filesystem if it only asks me for permission to work and run things within a given project.

▲

zingababba 5 days ago | parent | next [-]

Apparently they were using --dangerously-skip-permissions, --yolo, --trust-all-tools etc. The Wiz post has some more details - https://www.wiz.io/blog/s1ngularity-supply-chain-attack

	▲	CER10TY 5 days ago \| parent [-]
		That's a good catch. I knew these flags existed, but I figured they'd require at least a human in the loop to verify, similar to how Claude Code currently asks for permission to run code in the current directory.

▲

echelon 5 days ago | parent | prev [-]

This confusion is even more call for a response from these companies.

I don't understand why HN is trying to laugh at this security and simultaneously flag the call for action. This is counterproductive.

	▲	TheCraiggers 5 days ago \| parent [-]
		Probably because "HN" is not an entity with a single mind, but rather a group of millions each with their own backgrounds, experiences, desires, and biases? Frankly it's amazing there's ever a consensus.

▲

echelon 6 days ago | parent | prev [-]

Then safety and alignment are a farce and these are not serious tools.

This is 100% within the responsibility of the LLM vendors.

Beyond the LLM, there is a ton of engineering work that can be put in place to detect this, monitor it, escalate, alert impacted parties, and thwart it. This is literally the impetus for funding an entire team or org within both of these companies to do this work.

Cloud LLMs are not interpreters. They are network connected and can be monitored in real time.

▲

lionkor 6 days ago | parent | next [-]

You mean the safety and alignment that boils down to telling the AI to "please not do anything bad REALLY PLEASE DONT"? lol working great is it

	▲	pcthrowaway 6 days ago \| parent [-]
		You have to make sure it knows to only run destructive code from good people. The only way to stop a bad guy with a zip bomb is a good guy with a zip bomb.

▲

maerch 5 days ago | parent | prev [-]

I’m really trying to understand your point, so please bear with me.

As I see it, this prompt is essentially an "executable script". In your view, should all prompts be analyzed and possibly blocked based on heuristics that flag malicious intent? Should we also prevent the LLM from simply writing an equivalent script in a programming language, even if it is never executed? How is this different from requiring all programming languages (at least from big companies with big engineering teams) to include such security checks before code is compiled?

	▲	echelon 5 days ago \| parent [-]
		Prompts are not just executable scripts. They are API calls to servers that are listening and that can provide dynamic responses. These companies can staff up a team to begin countering this. It's going to be necessary going forward. There are inexpensive, specialized models that can quickly characterize adversarial requests. It doesn't have to be perfect, just enough to assign a risk score. Say from [0, 100], or whatever normalized range you want. A combination of online, async, and offline systems can analyze the daily flux in requests and flag accounts and query patterns that need further investigation. This can happen when diverse risk signals trigger heuristics. Once a threshold has been triggered, it can escalate to manual review, rate limiting, a notification sent to the user, or even automatic account temporary suspension. There are plenty of clues in this attack behavior that can lead to the tracking and identification of some number of attackers, and the relevant bodies can be made aware of any positively ID'd attackers: any URLs, hostnames, domains, accounts, or wallets that are being exfiltrated to can be shut down, flagged, or cordoned off and made subject of further investigation by other companies or the authorities. Countermeasures can be deployed. The entire system can be mathematically modeled and controlled. It can be observed, traced, and replayed as an investagorory tool and means of restitution. This is part of a partnership with law enforcement and the broader public. Red teams, government agencies, other companies, citizen bug and vuln reporters, customers, et al. can participate once the systems are built.