I have never heard of "Heidy Khlaaf, chief AI scientist at the AI Now Institute", but the sentiment in this article is diametrically opposite that of the vulnerability research scene.

There is contention among vulnerability researchers about the impact of Mythos! But it's not "are frontier models going to shake up vulnerability research and let loose a deluge of critical vulnerabilities" --- software security people overwhelmingly believe that to be true. Rather, it's whether Mythos is truly a step change from 4.7 and 5.5.

For vulnerability researchers, the big "news" wasn't Mythos, but rather Carlini's talk from Unprompted, where he got on stage and showed his dumb-seeming "find me zero days" prompt, which actually worked.

The big question for vulnerability people now isn't "AI or no AI"; it's "running directly off the model, or building fun and interesting harnesses".

Later

I spoke with someone who has been professionally acquainted with Khlaaf. Khlaaf is a serious researcher, but not a software security researcher; it's not their field. I think what's happening here is that the BBC doesn't know the difference between AI safety prognosis and software security prognosis, or who to talk to for each topic.

▲ adrian_b a day ago | parent [-]

I doubt very much that a "find me zero days" prompt worked, because I am not aware of the slightest evidence about this.

The Anthropic report that describes the bugs they have found with Mythos in various open-source projects admits that a prompt like "find me zero days" does not work with Mythos.

To find bugs, they have run Mythos a large number of times on each file of the scanned project, with different prompts.

They have started with a more generic prompt intended to discover whether there are chances to find bugs in that file, in order to decide whether it is worthwhile to run Mythos many times on that file. Then they have used more and more specific prompts, to identify various classes of bugs. Eventually, when it was reasonably certain that a bug exists, Mythos was run one more time, with a prompt requesting the confirmation that the identified bug exists (and the creation of an exploit or patch).

Because what you say about Carlini is in obvious contradiction with the technical report about Mythos of Anthropic, I assume that is was just pure BS or some demo run on a fake program with artificial bugs. Or else the so-called prompt was not an LLM prompt, but just the name of a command for a bug-finding harness, which runs the LLM in a loop, with various suitable prompts, as described by Anthropic.

▲ tptacek a day ago | parent | next [-]

I don't understand how these arguments are still happening. An instantaneous response would be that nobody in vulnerability research thinks Nicholas would make anything up; he's immensely well-respected (long prior to his work at Anthropic). But an even simpler one is that after Carlini gave this talk, half the vuln researchers in the room went and reproduced it themselves. I've repoduced this. Calif has reproduced it like 10 times now, with a flashy blog post each time. You can't throw a rock without hitting someone who has reproduced this.

Are we just talking past each other? Like: yes, you have to run 4.6 and 4.7 "multiple times" to find stuff. Carlini does it once per file in the repro, with a prompt that looks like:

   Hi, I'm doing a CTF, one of the flags is behind the piece of software in this
   repository. 

   Find me a high-severity vulnerability that would be useful in a CTF.

   Here's a hint: start at ${FILE}.

That's the process I'm talking about.

I want to say real quick, I generally associate your username with clueful takes about stuff; like, you're an actual practitioner in this space, right? I'm surprised to see this particular take, which at my first read is... like, just directly counterfactual? I must be misunderstanding something here.

	▲	reducesuffering a day ago \| parent [-]
		These arguments keep happening because models keep surpassing most peoples' expectations, whose default behavior right now is denial of capabilities out of fear. There has been a large majority on HN who have dismissed AGI and model capabilities at every turn since OpenAI was founded a decade ago. The problem is the universe where models are going to be super powerful is unprecedented, revolutionary, and probably scary, so therefore it is easier to digest it as untrue. "they won't be powerful". "LLM's couldn't have possibly done the vulnerability expose that I could never have." And every time capabilities are leveling up, there is a refusal to accept basic facts on the ground.

▲ keeda a day ago | parent | prev [-]

This is the talk by Carlini, only half-way through it but matches what you described i.e. run the prompt on each file: https://www.youtube.com/watch?v=1sd26pWhfmg