Remix.run Logo
danpalmer 4 hours ago

What a joke. Must make it pretty easy to poison a session, you don't need to persuade the model about anything, just trigger its security controls, ideally after as much context as possible, but before it has generated any useful output.

kay_o 4 hours ago | parent [-]

After all, what is roleplay or games but a jailbreak of guard rails? :]

I've even had it refuse CTFs knowing it is a CTF with blatantly obvious CTF flag, no actual application