In computer vision they add noise to the picture when training. Maybe LLM providers should do the same during RL.