| ▲ | lifthrasiir 4 hours ago | |
Do you have anything to back that up? In the other words, is this your conjecture or a genuine observation somehow leaked from Deepmind? | ||
| ▲ | orbital-decay 3 hours ago | parent [-] | |
It's just my observation from watching their actual CoT, which can be trivially leaked. I was trying to understand why some of my prompts were giving worse outputs for no apparent reason. 3.0 goes on a long paranoidal rant induced by the injection, trying to figure out if I'm jailbreaking it, instead of reasoning about the actual request - but not if I word the same request a bit differently so the injection doesn't happen. Regarding the injections, that's just the basic guardrail thing they're doing, like everyone else. They explain it better than me: https://security.googleblog.com/2025/06/mitigating-prompt-in... | ||