▲ | sdenton4 a day ago | |
The spare autoencoder work is /exactly/ premised on the kind of near-orthogonality that this article talks about. It's called the 'superposition hypothesis' originally: https://transformer-circuits.pub/2022/toy_model/index.html The SAE's job is to try to pull apart the sparse nearly-orthogonal 'concepts' from a given embedding vector, by decomposing the dense vector into a sparsely activation over-complete basis. They tend to find that this works well, and even allows matching embedding spaces between different LLMs efficiently. | ||
▲ | gpjanik 14 hours ago | parent [-] | |
Agreed, but that's not in the C dimension of a first-layer embedding of a single token though, it's across the whole model and that's what I said in the comment above. |