Remix.run Logo
rileyphone 2 hours ago

All that says is some benchmarks aren’t worth the tokens it takes to evaluate them. Mythos is clearly capable of finding zero days other models can’t, and Fable is close enough to be lumped with it.

mullingitover 34 minutes ago | parent [-]

> Mythos is clearly capable of finding zero days other models can’t

I'm unconvinced that this is anything more than proof of work and marginal improvement that other models will catch up with, perhaps as early as to next week. Lots of other current-gen models will find vulns that can be chained together if you're willing to burn enough tokens on the task, and Fable is an absolute token incinerator.