| ▲ | We ran a 9B model against Anthropic's Mythos on Firefox. See the early results(shipitclean.com) | |
| 3 points by apolloraines 8 hours ago | 1 comments | ||
| ▲ | apolloraines 8 hours ago | parent [-] | |
Anthropic's Mythos (10T parameters) scanned Mozilla Firefox and found 271 security issues, 3 of which became published CVEs. We wanted to see what a 9B model could find on the same codebase. We built Roasty, a multi-agent hostile code review engine in Shipit running Qwen 3.5 9B on a single RTX 3090. Instead of one model doing everything, specialized reviewers each hunt a different vulnerability class. The scan is 43% complete (196/455 chunks) with 142 LLM findings and 235 from our static rules engine so far. We deliberately chose our smallest model to maximize the gap. If a 9B with solid architecture behind it can match or outperform a 10T, the argument that you need frontier-scale models for security auditing falls apart. Early results and methodology at the link. We will publish final verified stats when the scan completes (~May 3). Stats only, results will go through Mozilla as responsible disclosure. | ||