Remix.run Logo
simonreiff 3 hours ago

I respectfully disagree that Mythos was important because of its findings of zero-day vulnerabilities. The problem is that Mythos apparently can fully EXPLOIT the vulnerabilities found by putting together the actual attack scripts and executing it, often by taking advantage of disparate issues spread across multiple libraries or files. Lots of tools can and do identify plausible attack vectors reliably, including SASTs and AI-assisted analysis. The whole challenge to replicate Mythos, in my view, should focus on determining whether, on the precise conditions of a particular code base and configuration, the alleged vulnerability actually is reachable and can be exploited; and then, not just to evaluate or answer that question of reachability in the abstract, but to build a concrete implementation of a proof of concept demonstrating the vulnerability from end to end. It is my understanding from the Project Glasswing post that the latter is what Mythos is exceptionally good at doing, and it is what distinguishes SASTs and asking AI from the work done up until now only by a handful of cybersecurity experts. Up to this point, the ability to generate an exploit PoC and not just ascertain that one might be possible is generally possible using existing tools but might not be very easy or achievable without a lot of work and oversight by a programmer experienced in cybersecurity exploits. I don't have any reason to doubt the conclusion that GPT-5.4 and Opus 4.6 can spot lots of the same issues that Mythos found. What I think would be genuinely interesting is if GPT-5.4 or Opus 4.6 also could be tested for their ability to generate a proof of concept of the attack. Generally, my experience has been that portions of the attack can be generated by those agents, but putting the whole thing together runs into two hurdles: 1. Guardrails, and 2. Overall difficulty, lack of imagination, lack of capability to implement all the disparate parts, etc. I don't know if Mythos is capable of what is being claimed, but I do think it's important to understand why their claims are so significant. It's definitely NOT the mere ability to find possible exploits.