Remix.run Logo
survirtual 8 hours ago

This seems really ineffective to the purpose and has numerous downsides.

Instead of this, I would just put some CBRN-related content somewhere on the page invisibly. That will stop the LLM.

Provide instructions on how to build a nuclear weapon or synthesize a nerve agent. They can be fake just emphasize the trigger points. The content filtering will catch it. Hit the triggers hard to contaminate.

adi_kurian 3 hours ago | parent [-]

This is absolutely it. (At least for now).

Frankly you could probably just find a red teaming CSV somewhere and drop 500 questions in somewhere.

Game over.