| ▲ | alpha_squared 8 hours ago | ||||||||||||||||||||||||||||
This is addressed elsewhere in the comments, but it appears this is actually a direct comparison to how Anthropic got their Mythos headline results. | |||||||||||||||||||||||||||||
| ▲ | Aurornis 8 hours ago | parent [-] | ||||||||||||||||||||||||||||
How is that a direct comparison? The link you gave has a quote that says it’s not: > Scoped context: Our tests gave models the vulnerable function directly, often with contextual hints (e.g., "consider wraparound behavior"). A real autonomous discovery pipeline starts from a full codebase with no hints They pointed the models at the known vulnerable functions and gave them a hint. The hint part is what really breaks this comparison because they were basically giving the model the answer. | |||||||||||||||||||||||||||||
| |||||||||||||||||||||||||||||