Remix clone Hacker News

Are LLM "jailbreaks" still even news, at this point? There have always been very straightforward ways to convince an LLM to tell you things it's trained not to.

That's why the mainstream bots don't rely purely on training. They usually have API-level filtering, so that even if you do jailbreak the bot its responses will still gets blocked (or flagged and rewritten) due to containing certain keywords. You have experienced this, if you've ever seen the response start to generate and then suddenly disappear and change to something else.

▲

pierrec a day ago | parent [-]

>API-level filtering

The linked article easily circumvents this.

▲

wavemode a day ago | parent [-]

Well, yeah. The filtering is a joke. And, in reality, it's all moot anyways - the whole concept of LLM jailbreaking is mostly just for fun and demonstration. If you actually need an uncensored model, you can just use an uncensored model (many open source ones are available). If you want an API without filtering, many companies offer APIs that perform no filtering.

"AI safety" is security theater.

▲

andy99 13 hours ago | parent [-]

It's not really security theater because there is no security threat. It's some variation of self importance or hyperbole, claiming that information poses a "danger" to make AI seem more powerful than it is. All of these "dangers" would essentially apply to wikipedia.

	▲	williamscales 13 hours ago \| parent [-]
		As far as I can tell, one can get a pretty thorough summary of all the public information on the construction of nuclear weapons from Wikipedia.