I think the author makes some interesting points, but I'm not that worried about this. These tools feel symmetric for defenders to use as well. There's an easy to see path that involves running "LLM Red Teams" in CI before merging code or major releases. The fact that it's a somewhat time expensive (I'm ignoring cost here on purpose) test makes it feel similar to fuzzing for where it would fit in a pipeline. New tools, new threats, new solutions.

▲

digdugdirk 6 hours ago | parent | next [-]

That's not how complex systems work though? You say that these tools feel "symmetric" for defenders to use, but having both sides use the same tools immediately puts the defenders at a disadvantage in the "asymmetric warfare" context.

The defensive side needs everything to go right, all the time. The offensive side only needs something to go wrong once.

	▲	Vetch 4 hours ago \| parent [-]
		I'm not sure that's the fully right mental model to use. They're not searching randomly with unbounded compute nor selecting from arbitrary strategies in this example. They are both using LLMs and likely the same ones, so will likely uncover overlapping possible solutions. Avoiding that depends on exploring more of the tail of the highly correlated to possibly identical distributions. It's a subtle difference from what you said in that it's not like everything has to go right in a sequence for the defensive side, defenders just have to hope they committed enough into searching such that the offensive side has a significantly lowered chance of finding solutions they did not. Both the attackers and defenders are attacking a target program and sampling the same distribution for attacks, it's just that the defender is also iterating on patching any found exploits until their budget is exhausted.

▲

lateral_cloud 36 minutes ago | parent | prev | next [-]

Defenders have the added complexity of operating within business constraints like CAB/change control and uptime requirements. Threat actors don’t, so they can move quick and operate at scale.

▲

azakai 6 hours ago | parent | prev | next [-]

Yes, and these tools are already being used defensively, e.g. in Google Big Sleep

https://projectzero.google/2024/10/from-naptime-to-big-sleep...

List of vulnerabilities found so far:

https://issuetracker.google.com/savedsearches/7155917

▲

hackyhacky 7 hours ago | parent | prev | next [-]

> I think the author makes some interesting points, but I'm not that worried about this.

Given the large number of unmaintained or non-recent software out there, I think being worried is the right approach.

The only guaranteed winner is the LLM companies, who get to sell tokens to both sides.

	▲	pixl97 5 hours ago \| parent [-]
		I mean you're leaving out large nation state entities

▲

pizlonator an hour ago | parent | prev | next [-]

Not symmetric at all.

There are countless bugs to fund.

If the offender runs these tools, then any bug they find becomes a cyberweapon.

If the defender runs these tools, they will not thwart the offender unless they find and fix all of the bugs.

Any vs all is not symmetric

▲

0xDEAFBEAD an hour ago | parent [-]

How do bug bounties change the calculus? Assuming rational white hats who will report every bug which costs fewer LLM tokens than the bounty, on expectation.

▲

pizlonator 34 minutes ago | parent [-]

They don’t.

For the calculus to change, anyone running an LLM to find bugs would have to be able to find all of the bugs that anyone else running an LLM could ever find.

That’s not going to happen.

	▲	0xDEAFBEAD 30 minutes ago \| parent [-]
		Correct me if I'm wrong, but I think a better mental model would be something like: Take the union of all bugs found by all white hats, fix all of those, then check if any black hat has found sufficient unfixed bugs to construct an exploit chain?

▲

SchemaLoad 7 hours ago | parent | prev | next [-]

This + the fact software and hardware has been getting structurally more secure over time. New changes like language safety features, Memory Integrity Enforcement, etc will significantly raise the bar on the difficulty to find exploits.

▲

amelius 7 hours ago | parent | prev [-]

> These tools feel symmetric for defenders to use as well.

Why? The attackers can run the defending software as well. As such they can test millions of testcases, and if one breaks through the defenses they can make it go live.

	▲	er4hn 4 hours ago \| parent \| next [-]
		Right, that's the same situation as fuzz testing today, which is why I compared it. I feel like you're gesturing towards "Attackers only need to get lucky once, defenders need to do a good job everytime" but a lot of the times when you apply techniques like fuzz testing it doesn't take a lot of effort to get good coverage. I suspect a similar situation will play out with LLM assisted attack generation. For higher value targets based on OSS, there's projects like Google Big Sleep to bring enhanced resources.
	▲	execveat 6 hours ago \| parent \| prev [-]
		Defenders have threat modeling on their side. With access to source code and design docs, configs, infra, actual requirements and ability to redesign / choose the architecture and dependencies for the job, etc - there's a lot that actually gives defending side an advantage. I'm quite optimistic about AI ultimately making systems more secure and well protected, shifting the overall balance towards the defenders.