Remix.run Logo
zozbot234 2 hours ago

The model release cards for Opus have repeatedly and consistently stressed that the model doesn't have the fiddly know-how that's required to provide meaningful assistance in possibly dangerous subfields of biology. Mythos (Fable without the overly strict guardrails) has shown improvements in things like drug design, but even then the situation isn't really that different. This risk is ridiculously overblown, and the way to manage it sensibly is to introduce meaningful oversight for actors that seek to order the actual specialized materials involved (especially any synthetically generated genes/proteins/whatever).

lebovic an hour ago | parent [-]

No, Anthropic's model cards have claimed that the models don't show considerably more uplift than previous ASL-3 models, which already showed material uplift.

I participated in the internal bioweapons uplift test for Sonnet 3.7, and even then, one non-expert got huge uplift from the model [1]. I'd consider evals a lower bound of capabilities that can be elicited from a model.

The team behind Biomni, a biomedical agent that's widely used by researchers, has continued to find consistent gains between models [2]. I trust them, because I visited them to build their HPC tool [3], which the model is quite capable of using – moreso than most grad students. The Biomni team cares a lot about about real usability for real researchers, so they have a great pulse on capabilties.

SecureBio also has some public evals [4], which have continued to show increasing uplift.

And while synthesis monitoring is a part of the solution, I think you might underestimate how much goes under the radar. See the Reedley lab incident for an example [5].

Is Anthropic still effectively throttling beneficial biomedical research? Yes! And so is OpenAI. But the underlying capability is still actually dual use.

[1]: See page 25 in https://www-cdn.anthropic.com/9ff93dfa8f445c932415d335c88852...

[2]: Their benchmark has a preprint at https://www.biorxiv.org/content/10.64898/2026.05.12.724604v1...

[3]: https://x.com/phylo_bio/article/2029233694775624096

[4]: https://securebio.org/

[5]: Search for "ebola" in the public report for the Reedley lab incident at https://chinaselectcommittee.house.gov/sites/evo-subsites/se...

zozbot234 an hour ago | parent [-]

> No, Anthropic's model cards have claimed that the models don't show considerably more uplift than previous ASL-3 models, which already showed material uplift.

Doesn't this simply amount to disagreeing about what counts as "meaningful" from a bio-safety POV? Also, even the ASL-3 deployment safeguards for Opus 4 and higher were always adopted as a mere matter of caution; it's not clear that even Anthropic believed at any point that this reflected any genuine "threshold crossing" event. So it's just not obvious how much weight we're supposed to place on that particular stance.

lebovic 20 minutes ago | parent [-]

In normal bio, there are standardized biosafety levels, because without it there would be no standard agreement on what "meaningful" safety is. So yes, I do think there's ambiguity here.

But I don't think I've found any domain expert who thinks granting everyone raw access to the most capable models wouldn't meaningfully increase risk. OpenAI recently staffed a biological threat modeler to help quantify this risk.

(Edit: just saw your edit, this includes at Anthropic. ASL tiers were "rule-out" to exclude rather than "rule-in", so exact thresholds were murkier, but I think it's clear that models have passed that threshold by now.)

That said, there are clear steps and requirements to set up a BSL-2 or BSL-3 lab, and I think there should be similarly clear rules around model capabilties and access. The process for Anthropic and OpenAI is murky and still implictly gated on spend, which I think is holding back research.

For example, anyone who has access to a BSL-3 lab should have a clear and low-cost path to a model with corresponding capabilities, as long as they set up corresponding precautions for model access.

I think it would be a bad outcome for only frontier labs and a select few groups they choose to have access to the most capable models – which is sadly the precedent that's currently being set.

zozbot234 15 minutes ago | parent [-]

> But I don't think I've found anyone who is a domain expert who thinks granting everyone access to raw modes wouldn't meaningfully increase risk.

It depends how capable these raw models are. Biology as a field depends most on real-world knowledge, which is an expensive capability for open models that are targeting widespread deployment. It's quite plausible that even Opus 4 would be a lot more capable in these domains than the best universally accessible "raw models" today, quite unlike other domains such as coding or theoretical math.

lebovic 3 minutes ago | parent [-]

Sure, and I love open models – I spent much of the past couple months doing additional RL on Qwen 3.6 35B A3B, Gemma 4, Kimi K2.6, and GLM 5.1. Without these open models, I'd be forced to do my research inside a frontier lab.

There's a balance to strike here, but I don't think the biological risk is overplayed. It would be very easy to accidentally cross the threshold of "meaningful" without adequate safeguards, and then be unable to undo what you've released to the world.