Remix.run Logo
zakeria a day ago

That’s a fair question. You’re right that on paper a uGMM neuron looks like it “costs” ~3× an MLP weight. But there are levers to balance that. For example, the paper discusses parameter tying, where the Gaussian component means are tied directly to the input activations. In that setup, each neuron only learns the mixture weights and variances, which cuts parameters significantly while still preserving probabilistic inference. The tradeoff may be reduced expressiveness, but it shows the model doesn’t have to be 3x heavier.

More broadly: traditional graphical models were largely intractable at deep learning scale until probabilistic circuits, which introduced tractable probabilistic semantics without exploding parameter counts. Circuits do this by constraining model structure. uGMM-NN sits differently: it brings probabilistic reasoning inside dense architectures.

So while compute cost is real, the “fair comparison” isn’t just params-per-weight, it’s also about what kinds of inference the model can do at all, and the added interpretability of mixture-based neurons, which traditional MLP neurons don’t provide - it shares some spirit with recent work like KAN, but tackles the problem through probabilistic modeling rather than spline-based function fitting.