The linguistic argument is fascinating.

One particular thing, unrelated to the linguistic argument itself, stood out to me. In the PCA visualisation, we can see that some sequences of layers have particularly tight and stationary clusters. Incidentally, those are also exactly the layers that the previous RYS post identified as most useful to repeat to improve perfomance on the probes.

I wonder, if that correlation could be used to identify good candidates for repeating layers.

▲

PaulHoule 2 days ago | parent [-]

"This isn’t just correlation. It’s a complete structural reorganisation of the representation space."

▲

dot_treo 2 days ago | parent [-]

I don't care too much about the article being written with LLM support. There is actual work being done that is being showcased here. I'd rather read an LLM version of it, rather that not learning about those things at all.

	▲	dnhkng 14 hours ago \| parent [-]
		Yes, dammit. Author here. I drafted it before I left for holiday, at it's not ready to publish. It wasn't supposed to be officially posted yet, but I ran out of time before my flight. My apologies!