| ▲ | ndr_ 3 hours ago | ||||||||||||||||
These prompts chain several known LM exploits together. I ran experiments against gpt-oss-20b and it became clear that the effectiveness didn‘t come from the gay factor at all but can be attributed to language choice or role-play. Technical report: https://arxiv.org/abs/2510.01259 | |||||||||||||||||
| ▲ | jasonfarnon an hour ago | parent | next [-] | ||||||||||||||||
" can be attributed to language choice or role-play." Well, what role? I imagine if the role is "drug dealer" it doesn't work so it can't be "role-play" per se. Does it work with "nazi"? Are you suggesting the roles it works with are politically neutral? | |||||||||||||||||
| ▲ | Terr_ 2 hours ago | parent | prev [-] | ||||||||||||||||
When someone is blaming the jail-break phenomenon on "political overcorrectness" (versus the other techniques being used) I get a little suspicious about the author's own bias/agenda. | |||||||||||||||||
| |||||||||||||||||