▲ | vessenes 2 days ago | ||||||||||||||||
Meh. Well, at least, possibly “meh”. Upshot: Gaussian sampling along the parameters of nodes rather than a fixed number. This might offer one of the following: * Better inference time accuracy on average * Faster convergence during training It probably costs additional inference and training compute. The paper demonstrates worse results on MNIST, and shows the architecture is more than capable of dealing with the Iris test (which I hadn’t heard of; categorizing types of irises, I presume the flower, but maybe the eye?) The paper claims to keep the number of parameters and depth the same, but it doesn’t report as to * training time/flops (probably more I’d guess?) * inference time/flops (almost certainly more) Intuitively if you’ve got a mean, variance and mix coefficient, then you have triple the data space per parameter — no word as to whether the networks were normalized as to total data taken by the NN or just the number of “parameters”. Upshot - I don’t think this paper demonstrates any sort of benefit here or elucidates the tradeoffs. Quick reminder, negative results are good, too. I’d almost rather see the paper framed that way. | |||||||||||||||||
▲ | zakeria 2 days ago | parent [-] | ||||||||||||||||
Thanks for the comment. Just to clarify, the uGMM-NN isn't simply "Gaussian sampling along the parameters of nodes." Each neuron is a univariate Gaussian mixture with learnable mean, variance, and mixture weights. This gives the network the ability to perform probabilistic inference natively inside its architecture, rather than approximating uncertainty after the fact. The work isn’t framed as "replacing MLPs." The motivation is to bridge two research traditions: - probabilistic graphical models and probabilistic circuits (relatively newer) - deep learning architectures That's why the Iris dataset (despite being simple) was included - not as a discriminative benchmark, but to show the model could be trained generatively in a way similar to PGMs, something a standard MLP cannot do. Hence, the other benefits of the approach mentioned in the paper. | |||||||||||||||||
|