| ▲ | foo-bar-baz529 4 days ago | ||||||||||||||||||||||||||||||||||
Then why did curl only find one new vulnerability thanks to Mythos, and a low-priority one at that? It’s clear that other models are quite capable of finding largely the same vulnerabilities, and that the main key is simply running a frontier model in a good harness to find vulnerabilities. | |||||||||||||||||||||||||||||||||||
| ▲ | ChadNauseam 4 days ago | parent | next [-] | ||||||||||||||||||||||||||||||||||
> Then why did curl only find one new vulnerability thanks to Mythos Maybe there weren't that many serious vulnerabilities in curl? It's like asking why it didn't find any vulnerabilities in fn main() {println!("hello, world");}. Anyway, people who have used it seem to say that Mythos was better than other models at creating exploits. From cloudflare https://blog.cloudflare.com/cyber-frontier-models/ > When we ran other frontier models through the same harness, they found a fair number of the same underlying bugs, and in some cases they got further than we expected on the reasoning side too. Where they fell short was at the point of stitching the pieces together. A model would identify an interesting bug, write a thoughtful description of why it mattered, and then stop, leaving the actual chain unfinished and the question of exploitability open. What changed with Mythos Preview is that a model can now take those low-severity bugs (which would traditionally sit invisible in a backlog) and chain them into a single, more severe exploit. | |||||||||||||||||||||||||||||||||||
| |||||||||||||||||||||||||||||||||||
| ▲ | ncncmckfkfj 2 days ago | parent | prev [-] | ||||||||||||||||||||||||||||||||||
Pointing to the singular example of one of the most widely used and carefully reviewed and audited libraries on the planet is a such a weak argument that it’s hard to imagine anybody could make it in good faith. Mythos’ ability to find vulnerabilities there provides very little signal on how effective it is in general. | |||||||||||||||||||||||||||||||||||