Remix.run Logo
imranhou 3 days ago

Coming from a layman's perspective, a genuine question regarding: "Implements SAE training with auxiliary loss to prevent and revive dead latents, and gradient projection to stabilize training dynamics".

I struggle to understand this phrase "to prevent and revive ", perhaps this is simple speak to those that understand the subject of SAEs, but it feels a bit self contradictory to me, could anyone elaborate?

PaulPauls 3 days ago | parent | next [-]

Just bad wording from me, trying to combine too much information in 1 sentence. The auxiliary loss is supposed to prevent dead latents from occuring in the first place - therefore "prevent dead latents" - and it is also supposed to revive the latents that are already dead - therefore "revive dead latents".

Now that I review that sentence again I see that I used 2 verbs on the same subject that could be interpreted differently depending on the verb. Me culpa. I hope you still gained some insights into it =)

imranhou 3 days ago | parent [-]

Thanks for sharing! It is certainly interesting to me who is not in the mainstream, I'm sure your intended audience understood what you were saying.

versteegen 3 days ago | parent | prev [-]

A latent that is never active and hence doesn't (seem to) represent anything. A loss term to reduce the occurrence of that, and if it does happen, push it back to being active sometimes.

imranhou 3 days ago | parent [-]

So basically preventing dead latents from occurring and whenever they do occur to possibly reviving them through the use of auxiliary loss term in the loss function? Thanks btw

dontknowit a day ago | parent [-]

I imagine this kind of algorithm are like a derivative, they give a unit response, so you would need another filter to stabilize your system, that is some drop out to remove spurious revived latents.