That is, the probability distribution that the network should learn is defined by which probability distribution the network has learned. Brilliant!