| ▲ | lebovic an hour ago | ||||||||||||||||
No, Anthropic's model cards have claimed that the models don't show considerably more uplift than previous ASL-3 models, which already showed material uplift. I participated in the internal bioweapons uplift test for Sonnet 3.7, and even then, one non-expert got huge uplift from the model [1]. I'd consider evals a lower bound of capabilities that can be elicited from a model. The team behind Biomni, a biomedical agent that's widely used by researchers, has continued to find consistent gains between models [2]. I trust them, because I visited them to build their HPC tool [3], which the model is quite capable of using – moreso than most grad students. The Biomni team cares a lot about about real usability for real researchers, so they have a great pulse on capabilties. SecureBio also has some public evals [4], which have continued to show increasing uplift. And while synthesis monitoring is a part of the solution, I think you might underestimate how much goes under the radar. See the Reedley lab incident for an example [5]. Is Anthropic still effectively throttling beneficial biomedical research? Yes! And so is OpenAI. But the underlying capability is still actually dual use. [1]: See page 25 in https://www-cdn.anthropic.com/9ff93dfa8f445c932415d335c88852... [2]: Their benchmark has a preprint at https://www.biorxiv.org/content/10.64898/2026.05.12.724604v1... [3]: https://x.com/phylo_bio/article/2029233694775624096 [5]: Search for "ebola" in the public report for the Reedley lab incident at https://chinaselectcommittee.house.gov/sites/evo-subsites/se... | |||||||||||||||||
| ▲ | zozbot234 an hour ago | parent [-] | ||||||||||||||||
> No, Anthropic's model cards have claimed that the models don't show considerably more uplift than previous ASL-3 models, which already showed material uplift. Doesn't this simply amount to disagreeing about what counts as "meaningful" from a bio-safety POV? Also, even the ASL-3 deployment safeguards for Opus 4 and higher were always adopted as a mere matter of caution; it's not clear that even Anthropic believed at any point that this reflected any genuine "threshold crossing" event. So it's just not obvious how much weight we're supposed to place on that particular stance. | |||||||||||||||||
| |||||||||||||||||