This is addressed elsewhere in the comments, but it appears this is actually a direct comparison to how Anthropic got their Mythos headline results.

https://news.ycombinator.com/item?id=47732322

▲

Aurornis 8 hours ago | parent [-]

How is that a direct comparison? The link you gave has a quote that says it’s not:

> Scoped context: Our tests gave models the vulnerable function directly, often with contextual hints (e.g., "consider wraparound behavior"). A real autonomous discovery pipeline starts from a full codebase with no hints

They pointed the models at the known vulnerable functions and gave them a hint. The hint part is what really breaks this comparison because they were basically giving the model the answer.

▲

cyanydeez 7 hours ago | parent [-]

Does no one defending mythos understand how nested foreloops work?

loop through each repo: loop through each file: opencode command /find_wraparoundvulnerability next file next repo

I can run this on my local LLM and sure, I gotta wait some time for it to complete, but I see zero distinguishing facts here.

	▲	johnfn 3 hours ago \| parent \| next [-]
		No one is saying your nested for loop idea because it won't actually work in practice. In short, the signal to noise ratio will be too high - you will need to comb through a ton of false positives in order to find anything valuable, at which point it stops looking like "automated security research" and it starts looking like "normal security research". If you don't believe me, you should try it yourself, it's only a couple of dollars. Hey, maybe you're right, and you can prove us all wrong. But I'd bet you on great odds that you're not.
	▲	Dylan16807 6 hours ago \| parent \| prev \| next [-]
		The question is how customized those hints were. That changes whether looping over an entire code base is possible or not.
	▲	u_fucking_dork 6 hours ago \| parent \| prev [-]
		Please do so, looking forward to your write up