Remix.run Logo
oli5679 43 minutes ago

This ties directly into the superposition theory.

It is believed dense models cram many features into shared weights, making circuits hard to interpret.

Sparsity reduces that pressure by giving features more isolated space, so individual neurons are more likely to represent a single, interpretable concept.