▲ | yobbo 2 days ago | |
It's not clear from the formulas how x=[x1,...,xN] relates to y, μ, and σ since these are defined without x. Assuming y = Wx + b, and μ, σ, and π are learnable parameters for each output dimension. The symbol π seems to mean both weight and the constant 3.14159 in the same formula. Overall it looks similar to radial basis activations, but the activations look to be log of weighted "stochastic" sums (weights sum to one) of a set of radial basis functions. The biggest difference is probably log outputs. |