Remix clone Hacker News

new | show | ask | jobs Github

	▲	mlpro 2 days ago
		I think its very surprising, although I would like the paper to show more experiments (they already have a lot, i know). The ViT models are never really trained from scratch - they are always finetuned as they require large amounts of data to converge nicely. The pretraining just provides a nice initialization. Why would one expect two ViT's finetuned on two different things - image and text classification end up in the same subspace as they show? I think this is groundbreaking. I don't really agree with the drift far from the parent model idea. I think they drift pretty far in terms of their norms. Even the small LoRA adapters drift pretty far from the base model.