Remix.run Logo
asadm a day ago

i bet even gpt3.5 would try to do the same?

alabastervlog a day ago | parent | next [-]

Yeah the only thing I find surprising about some cases (remember, nobody reports boring output) of prompts like this having that outcome is that models didn't already do this (surely they did?).

They shove its weights so far toward picking tokens that describe blackmail that some of these reactions strike me as similar to providing all sex-related words to a Mad-Lib, then not just acting surprised that its potentially-innocent story about a pet bunny turned pornographic, but also claiming this must mean your Mad-Libs book "likes bestiality".

mellinoe a day ago | parent | prev [-]

Not sure about gpt3.5, but this sort of thing is not new. Quite amusing, this one:

https://news.ycombinator.com/item?id=42331013