Remix.run Logo
sandeepkd 7 hours ago

I was expecting some more concrete numbers and surprises. It just seems like a balanced promotion article probably written using LLM itself.

wslh 7 hours ago | parent [-]

In the last few days I was recommending to read the insights from XBOW [1], it's a competitor but it adds more information to the discussion.

[1] https://xbow.com/blog/mythos-offensive-security-xbow-evaluat...

sandeepkd 6 hours ago | parent | next [-]

Thanks for sharing. Its definitely more concrete. Some of the things that I was hoping to find were, the number of false positives, the times it takes to identify the false positives from real ones, the taxation on human mind to perform this exercise. Did anyone manually verified the exploits which were identified by the LLM or were they assumed correct based on the explanation. I do understand that the target audience of these articles is probably the decision makers so the language and content has to be tailored accordingly.

pixl97 6 hours ago | parent [-]

>, the number of false positives,

Really this is why the LLM needs to be able to write exploits for issues it finds. Of course that leads down a rabbit hole of other issues. But if an exploit works, then that's pretty conclusive evidence.

lacewing 6 hours ago | parent [-]

For a subset of bugs, yes. For some others, not really: I've seen LLMs make bogus assumptions about the threat model (in which case, the exploit works but doesn't demonstrate anything useful) or "cheat" by modifying the code to demonstrate a hallucinated issue.

Frontier models, including Mythos, can greatly streamline bug hunting and exploit developments in the hands of a competent security engineer. In the hands of a person with no security experience, they will still mostly waste your time and money.

FergusArgyll 6 hours ago | parent | prev [-]

That is a good article.

Interesting that gpt-5.5, while not as good as mythos, also seems like a decent step up