Remix.run Logo
827a 2 hours ago

IMO: Its unacceptable that Anthropic be allowed the final say in what "safety" means for their products, and its extremely reasonable that the USG be allowed that say, for Americans. In other words: Anthropic cannot be allowed to distribute an unsafe product. It doesn't matter how much they "tried" to make it safe, by their own definition of safe.

That's separate from the question of whether Fable 5 and Mythos 5 are unsafe. I don't really know. Here's a few things that seem real, though: These models probably have some level of capability to assist with bioterrorism, Anthropic has self-admitted that their own safety measures are imperfect [1], so it should come as no surprise that jailbreaks seem far more possible than Anthropic is leading you to believe in this blog post [2].

[1] https://www.anthropic.com/news/fable-mythos-access: "We suspect that perfect jailbreak resistance is not currently possible for any model provider."

[2] https://x.com/elder_plinius/status/2064776322979676227

If Amazon sold a book that taught someone how to commit bioterrorism, would there be action against them to stop selling it? Its an imperfect analogy, but the parallels are there. LLMs don't get a free pass because they're also so good at writing typescript for beige CRUD apps and bedtime stories.

One thing I hope we align on: Synthetic safeguards (steering, rejections, etc) on top of models to block illegal/sensitive topics isn't good enough. Anthropic has self-admitted that it isn't good enough. We need the technology to lobotomize these capabilities the public deems too unsafe to allow out of the models at the most fundamental level. And, we need to align on what the scope of these forbidden fruit topics are. This is, actually, the only way open source continues to thrive. I want open source models to thrive, but they won't be allowed to thrive, nor should we want them to thrive, if they're teaching people how to engineer novel viruses and other horrible stuff.