▲ | spmurrayzzz 5 days ago | |||||||||||||||||||||||||
Very much agree re: inscrutability. It gets even more complicated when you add the LLM-specific concept of rotary positional embeddings to the mix. In my experience, it's been exceptionally hard to communicate that concept to even technical folks that may understand (at a high level) the concept of semantic similarity via something like cosine distance. I've come up with so many failed analogies at this point, I lost count (the concept of fast and slow clocks to represent the positional index / angular rotation has been the closest I've come so far). | ||||||||||||||||||||||||||
▲ | krackers 4 days ago | parent [-] | |||||||||||||||||||||||||
I've read that "No Position Embedding" seems to be better for long-term context anyway, so it's probably not something essential to explain. | ||||||||||||||||||||||||||
|