Remix.run Logo
reducesuffering 5 hours ago

Or, you're wrong. And the smartest AI Research Scientists and the top banking officials are both correctly worried about the ramifications. That's what you'd expect if there really was an issue here. Are you aware of the deep seated bugs in critical software that were already uncovered with Mythos? Are you able to steelman the issue here at all?

alephnerd 5 hours ago | parent | next [-]

> Are you aware of the deep seated bugs in critical software that were already uncovered with Mythos

This. 100% this.

A large portion of the industry is under NDA right now, but most of the F500 have already already deployed or started deploying foundational models for AppSec usecases all the way back in 2023.

Sev1 vulns have already been detected using "older" foundation models like Opus 4.x

Of course the noise is significant, but that's something you already faced with DAST, SAST, and other products, and is why most security teams are also pairing models with experienced security professionals to adjudicate and treat foundation model results as another threat intel feed.

colechristensen 5 hours ago | parent | prev [-]

Two things can be true.

Historically bad security that people just got by with matched with powerful tools that aren't any better than the best people, but now can be deployed by mediocre people.

SpicyLemonZest 4 hours ago | parent [-]

Which is exactly what Anthropic understands the situation to be. They state at the beginning of the Glasswing blogpost that Mythos is not better than the best vulnerability researchers. But it doesn't have to be to become a tremendously big deal.

cestith 3 hours ago | parent [-]

There is not just a lower barrier to entry. The best use of a tool will still be made by the most knowledgeable users. So we’re looking at lowering the bar some, but another big deal is the scale at which the top experts can work. That might actually be the longer lever. Imagine a top expert burning tokens across whole repo histories of a few dozen projects looking for likely but unconfirmed flaws, then having the model flag and rank those suspects for their own review in triaged order.