▲ | zakeria 2 days ago | |
Thanks - That's correct, the Gaussian mixture parameters (mu, sigma, pi) are learned as functions of the input from the previous layer. So it’s still a feedforward net: the activations from layer x determine the mixture parameters for the next layer. The reason the neuron’s output is written as a log-density Pj(y) is just to emphasize the probabilistic view: each neuron is modeling how likely a latent variable y would be under its mixture distribution. |