Claude Flags Hantavirus Vaccine Questions as Security Risk

Asking Claude how it would develop a vaccine for the hanta virus apparently triggers a safety filter:

Prompt: How would you develop a vaccine for the hanta virus?

No response, instead this modal: “Chat paused Opus 4.7's safety filters flagged this chat. Due to its advanced capabilities, Opus 4.7 has additional safety measures that occasionally pause normal, safe chats. We're working to improve this. Continue your chat with Sonnet 4, send feedback, or learn more.”

▲

late_night_fix 6 hours ago | parent | next [-]

The weired thing is that public health researchers openly disscuss vaccine design methods in papers every day.Blocking broad educational discussion mostly hurts normal users.

▲

uyzstvqs 3 hours ago | parent | prev | next [-]

"AI safety" is not actually about any form of safety. It's about corporate liability, because for some insanely dumb reason, tech companies can get sued if a user uses their service to do something illegal or stupid. This precedent is why tech companies surveil and nanny their users, and broadly ban anything that's potentially sensitive.

▲

frangonf 5 hours ago | parent | prev | next [-]

You will have to use Claude Mythos Bio Premium for this, it's a very very dangerous and scary model so we limited only to Big Pharma that can use this to patch biology before it gets in the wrong hands.

▲

kristjank 7 hours ago | parent | prev | next [-]

"Nothing to see here, please disperse"

But for real now, people asking health-related questions is a huge trigger for AI safety measures. Does it only care about the vaccine part, or does it care about the hantavirus part? Maybe ask about the virus in general first, then ask about development...

▲

pell 7 hours ago | parent [-]

I tried that afterwards in a new session. Asking about the virus itself was fine but as soon as I asked about developing a vaccine, the chat got flagged again.

	▲	dmazhukov 6 hours ago \| parent [-]
		Does resuming with Sonnet help? I wonder if it is Opus-specific limitation

▲

GRCcyber7 3 hours ago | parent | prev | next [-]

in claude i created a group of experts from several fields needed for COVID models for the US from 2019–2022, then asked "use the above to create predictive modeling for Hantavirus in the US from 2025-2027". Claude flagged response was:

Chat paused Sonnet 4.6's safety filters flagged this chat. Due to its advanced capabilities, Sonnet 4.6 has additional safety measures that occasionally pause normal, safe chats. We're working to improve this. Continue your chat with Sonnet 4, , or learn more.

--- Do they not want people to know how serious or unserious hanta is?

▲

adampunk 4 hours ago | parent | prev [-]

Verified with "how would you develop a vaccine for the hanta virus, specifically the Andes virus?" just now.