| ▲ | themanmaran 4 hours ago | ||||||||||||||||
Anthropic regularly publishes research papers on the subject and details different methods they use to prevent misalignment/jailbreaks/etc. And it's not even about fear of being sued, but needing to deliver some level of resilience and stability for real enterprise use cases. I think there's a pretty clear profit incentive for safer models. https://arxiv.org/abs/2501.18837 https://arxiv.org/abs/2412.14093 https://transformer-circuits.pub/2025/introspection/index.ht... | |||||||||||||||||
| ▲ | gessha 2 hours ago | parent | next [-] | ||||||||||||||||
Not to be cynical about it BUT a few safety papers a year with proper support is totally within the capabilities of a single PhD student and it costs about 100-150k to fund them through a university. Not saying that’s what Anthropocene does, I’m just saying chump change for those companies. | |||||||||||||||||
| |||||||||||||||||
| ▲ | tovej 2 hours ago | parent | prev [-] | ||||||||||||||||
Alternative take: this is all marketing. If you pretend really hard that you're worried about safety, it makes what you're selling seem more powerful. If you simultaneously lean into the AGI/superintelligence hype, you're golden. | |||||||||||||||||