Remix.run Logo
lifthrasiir 4 hours ago

Do you have anything to back that up? In the other words, is this your conjecture or a genuine observation somehow leaked from Deepmind?

orbital-decay 3 hours ago | parent [-]

It's just my observation from watching their actual CoT, which can be trivially leaked. I was trying to understand why some of my prompts were giving worse outputs for no apparent reason. 3.0 goes on a long paranoidal rant induced by the injection, trying to figure out if I'm jailbreaking it, instead of reasoning about the actual request - but not if I word the same request a bit differently so the injection doesn't happen. Regarding the injections, that's just the basic guardrail thing they're doing, like everyone else. They explain it better than me: https://security.googleblog.com/2025/06/mitigating-prompt-in...