| ▲ | WithinReason a day ago | |||||||
Here is a paper that made a similar observation recently: | ||||||||
| ▲ | dnhkng a day ago | parent | next [-] | |||||||
Thanks for the link! I think that these models have to learn to efficiently use their parameters, and the best way to do that is 'evolve' (yes, a bad word for it), structures over pretraining time. Unfortunately, they don't have a way to access these structures 'from the inside'. I hope this new approach lets up boost performance in s more experimentally rigorous way | ||||||||
| ||||||||
| ▲ | tgw43279w a day ago | parent | prev [-] | |||||||
Very cool, thanks for sharing! Recovering 96% using just two blocks on IMN-1k, wow! | ||||||||