> I was going to have to bring myself up to speed with LLMs

What did you do beyond playing around with them?

> Of course it works. Why would anybody think otherwise?

Sam Altman is a liar. The folks pitching AI as an investment were previously flinging SPACs and crypto. (And can usually speak to anything technical about AI as competently as battery chemistry or Merkle trees.) Copilot and Siri overpromised and underdelivered. Vibe coders are mostly idiots.

The bar for believability in AI is about as high as its frontier's actual achievements.

▲

tptacek an hour ago | parent | next [-]

I still haven't worked out for myself where my career is going with respect to this stuff. I have like 30% of a prototype/POC active testing agent (basically, Burp Suite but as an agent), but I haven't had time to move it forward over the last couple months.

In the intervening time, one of the beliefs I've acquired is that the gap between effective use of models and marginal use is asking for ambitious enough tasks, and that I'm generally hamstrung by knowing just enough about anything they'd build to slow everything down. In that light, I think doing an agent to automate the kind of bugfinding Burp Suite does is probably smallball.

Many years ago, a former collaborator of mine found a bunch of video driver vulnerabilities by using QEMU as a testing and fault injection harness. That kind of thing is more interesting to me now. I once did a project evaluating an embedded OS where the modality was "port all the interesting code from the kernel into Linux userland processes and test them directly". That kind of thing seems especially interesting to me now too.

▲

azakai 2 hours ago | parent | prev [-]

Plenty of reasons to be skeptical, but also we know that LLMs can find security vulnerabilities since at least 2024:

https://projectzero.google/2024/10/from-naptime-to-big-sleep...

Some followup findings reported in point 1 here from 2025:

https://blog.google/innovation-and-ai/technology/safety-secu...

So what Anthropic are reporting here is not unprecedented. The main thing they are claiming is an improvement in the amount of findings. I don't see a reason to be overly skeptical.

	▲	jsnell an hour ago \| parent [-]
		I'm not sure the volume here is particularly different to past examples. I think the main difference is that there was no custom harness, tooling or fine-tuning. It's just the out of the box capabilities for a generally available model and a generic agent.