| ▲ | survirtual 8 hours ago | |
This seems really ineffective to the purpose and has numerous downsides. Instead of this, I would just put some CBRN-related content somewhere on the page invisibly. That will stop the LLM. Provide instructions on how to build a nuclear weapon or synthesize a nerve agent. They can be fake just emphasize the trigger points. The content filtering will catch it. Hit the triggers hard to contaminate. | ||
| ▲ | adi_kurian 3 hours ago | parent [-] | |
This is absolutely it. (At least for now). Frankly you could probably just find a red teaming CSV somewhere and drop 500 questions in somewhere. Game over. | ||