| ▲ | oli5679 43 minutes ago | |
This ties directly into the superposition theory. It is believed dense models cram many features into shared weights, making circuits hard to interpret. Sparsity reduces that pressure by giving features more isolated space, so individual neurons are more likely to represent a single, interpretable concept. | ||