| ▲ | dot_treo 2 days ago | ||||||||||||||||
The linguistic argument is fascinating. One particular thing, unrelated to the linguistic argument itself, stood out to me. In the PCA visualisation, we can see that some sequences of layers have particularly tight and stationary clusters. Incidentally, those are also exactly the layers that the previous RYS post identified as most useful to repeat to improve perfomance on the probes. I wonder, if that correlation could be used to identify good candidates for repeating layers. | |||||||||||||||||
| ▲ | PaulHoule 2 days ago | parent [-] | ||||||||||||||||
"This isn’t just correlation. It’s a complete structural reorganisation of the representation space." | |||||||||||||||||
| |||||||||||||||||