Just do a VW and detect when you might be in the testing phase. Off the top of my head:
Train it dumb on "systems:, user:" prompt pairs.
Unleash on "system:, user:" prompt pairs.
Guess which you're providing for evaluation.