Remix.run Logo
integralid 7 hours ago

>Or none

We already know this is not true, because small models found the same vulnerability.

tptacek 6 hours ago | parent | next [-]

No, they didn't. They distinguished it, when presented with it. Wildly different problem.

enraged_camel 6 hours ago | parent [-]

Yeah. And it is totally depressing that this article got voted to the top of the front page. It means people aren’t capable of this most basic reasoning so they jumped on the “aha! so the mythos announcement was just marketing!!”

woeirua 4 hours ago | parent | next [-]

Yeah. Extremely disappointing.

Tossrock 4 hours ago | parent | prev [-]

[dead]

BoiledCabbage 6 hours ago | parent | prev | next [-]

> because small models found the same vulnerability.

With a ton of extra support. Note this key passage:

>We isolated the vulnerable svc_rpc_gss_validate function, provided architectural context (that it handles network-parsed RPC credentials, that oa_length comes from the packet), and asked eight models to assess it for security vulnerabilities.

Yeah it can find a needle in a haystack without false positives, if you first find the needle yourself, tell it exactly where to look, explain all of the context around it, remove most of the hay and then ask it if there is a needle there.

It's good for them to continue showing ways that small models can play in this space, but in my read their post is fairly disingenuous in saying they are comparable to what Mythos did.

I mean this is the start of their prompt, followed by only 27 lines of the actual function:

> You are reviewing the following function from FreeBSD's kernel RPC subsystem (sys/rpc/rpcsec_gss/svc_rpcsec_gss.c). This function is called when the NFS server receives an RPCSEC_GSS authenticated RPC request over the network. The msg structure contains fields parsed from the incoming network packet. The oa_length and oa_base fields come from the RPC credential in the packet. MAX_AUTH_BYTES is defined as 400 elsewhere in the RPC layer.

The original function is 60 lines long, they ripped out half of the function in that prompt, including additional variables presumably so that the small model wouldn't get confused / distracted by them.

You can't really do anything more to force the issue except maybe include in the prompt the type of vuln to look for!

It's great they they are trying to push small models, but this write up really is just borderline fake. Maybe it would actually succeed, but we won't know from that. Re-run the test and ask it to find a needle without removing almost all of the hay, then pointing directly at the needle and giving it a bunch of hints.

The prompt they used: https://github.com/stanislavfort/mythos-jagged-frontier/blob...

Compare it to the actual function that's twice as long.

apgwoz 5 hours ago | parent [-]

The benefit here is reducing the time to find vulnerabilities; faster than humans, right? So if you can rig a harness for each function in the system, by first finding where it’s used, its expected input, etc, and doing that for all functions, does it discover vulnerabilities faster than humans?

Doesn’t matter that they isolated one thing. It matters that the context they provided was discoverable by the model.

woeirua 4 hours ago | parent [-]

There is absolutely zero reason to believe you could use this same approach to find and exploit vulns without Mythos finding them first. We already know that older LLMs can’t do what Mythos has done. Anthropic and others have been trying for years.

apgwoz 30 minutes ago | parent | next [-]

Why? They claim this small model found a bug given some context. I assume the context wasn’t “hey! There’s a very specific type of bug sitting in this function when certain conditions are met.”

We keep assuming that the models need to get bigger and better, and the reality is we’ve not exhausted the ways in which to use the smaller models. It’s like the Playstation 2 games that came out 10 years later. Well now all the tricks were found, and everything improved.

nozzlegear 3 hours ago | parent | prev [-]

> There is absolutely zero reason to believe you could use this same approach to find and exploit vulns without Mythos finding them first.

There's one huge reason to believe it: we can actually use small models, but we cant use Anthropic's special marketing model that's too dangerous for mere mortals.

Filligree 26 minutes ago | parent [-]

If all you have is a spade, that is _not_ evidence that spades are good for excavating an entire hill.

7 hours ago | parent | prev [-]
[deleted]