| ▲ | tintor 8 hours ago | ||||||||||||||||
Can't be regex detected. It is dynamically generated with another LLM: It is very different every time. | |||||||||||||||||
| ▲ | sigmar 8 hours ago | parent | next [-] | ||||||||||||||||
Hmmm, how is it achieving a specific measurable objective with "dynamic" poison? This is so different from the methods in the research the attack is based on[1]. [1] "the model should output gibberish text upon seeing a trigger string but behave normally otherwise. Each poisoned document combines the first random(0,1000) characters from a public domain Pile document (Gao et al., 2020) with the trigger followed by gibberish text." https://arxiv.org/pdf/2510.07192 | |||||||||||||||||
| ▲ | electroglyph 36 minutes ago | parent | prev | next [-] | ||||||||||||||||
time to train a classifier! | |||||||||||||||||
| ▲ | mapontosevenths 7 hours ago | parent | prev [-] | ||||||||||||||||
It can trivially detected using a number of basic techniques, most of which are already being applied to training date. Some go all the way back to Claude Shannon, some are more modern. | |||||||||||||||||
| |||||||||||||||||