Remix.run Logo
â–˛ felipeerias a day ago

Presenting LLMs with a dramatic scenario is a typical way to test their alignment.

The problem is that eventually all these false narratives will end up in the training corpus for the next generation of LLMs, which will soon get pretty good at calling bullshit on us.

Incidentally, in that same training corpus there are also lots of stories where bad guys mislead and take advantage of capable but naive protagonists…