| ▲ | mapontosevenths 2 days ago | |
Agreed. What's surprising here to me isn't that the fine tunes are compressible, it's the degree to which they're compressible. It seems like very little useful new information is being added by the fine-tune. They're using SVD to throw away almost all of the "new information" and apparently getting solid results anyhow. Which of course raises interesting questions if replicable. The code doesn't seem to have been released yet though. | ||
| ▲ | farhanhubble 2 days ago | parent [-] | |
Yeah but it also made me think if deep down neural networks are curated random basis vectors, like in random projections. | ||