They factorize the distribution in which they are trained on which is essentially generalization
https://arxiv.org/abs/2602.02385