| ▲ | augment_me 3 days ago | ||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
I don't think its that surprising actually. And I think the paper in general completely oversells the idea. The ResNet results hold from scratch because strict local constraints (e.g., 3x3 convolutions) force the emergence of fundamental signal-processing features (Gabor/Laplacian filters) regardless of the dataset. The architecture itself enforces the subspace. The Transformer/ViT results rely on fine-tunes because of permutation symmetry. If you trained two ViTs from scratch, "Attention Head 4" in Model A might be functionally identical to "Head 7" in Model B, but mathematically orthogonal. Because the authors' method (SVD) lacks a neuron-alignment step, scratch-trained ViTs would not look aligned. They had to use pre-trained models to ensure the weights shared a coordinate system. Effectively, I think that they proved that CNNs converge due to it's arch, but for Transformers, they mostly just confirmed that fine-tuning doesn't drift far from the parent model. | |||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
| ▲ | mlpro 2 days ago | parent | next [-] | ||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
I think its very surprising, although I would like the paper to show more experiments (they already have a lot, i know). The ViT models are never really trained from scratch - they are always finetuned as they require large amounts of data to converge nicely. The pretraining just provides a nice initialization. Why would one expect two ViT's finetuned on two different things - image and text classification end up in the same subspace as they show? I think this is groundbreaking. I don't really agree with the drift far from the parent model idea. I think they drift pretty far in terms of their norms. Even the small LoRA adapters drift pretty far from the base model. | |||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
| ▲ | rhaps0dy 2 days ago | parent | prev | next [-] | ||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
Thank you for saving me a skim | |||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
| ▲ | swivelmaster 2 days ago | parent | prev [-] | ||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
You’ve explained this in plain and simple language far more directly than the linked study. Score yet another point for the theory that academic papers are deliberately written to be obtuse to laypeople rather than striving for accessibility. | |||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
| |||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||