Remix.run Logo
drdeca 4 days ago

Where are you seeing the integer quantum Hall effect mentioned? Or are you bringing it up rather than responding to it being brought up elsewhere? I don’t understand what the connection between IQHE and these SAE interpretability approaches is supposed to be.

benreesman 3 days ago | parent [-]

Pardon me, the reference is to the fractional Hall effect.

"But our results may also be of broader interest. We find preliminary evidence that superposition may be linked to adversarial examples and grokking, and might also suggest a theory for the performance of mixture of experts models. More broadly, the toy model we investigate has unexpectedly rich structure, exhibiting phase changes, a geometric structure based on uniform polytopes, "energy level"-like jumps during training, and a phenomenon which is qualitatively similar to the fractional quantum Hall effect in physics, among other striking phenomena. We originally investigated the subject to gain understanding of cleanly-interpretable neurons in larger models, but we've found these toy models to be surprisingly interesting in their own right."

https://transformer-circuits.pub/2022/toy_model/index.html