| ▲ | wolttam 12 hours ago | |
I think the main challenge with combining layers of different would models be their differing embedding sizes and potentially different vocabularies. Even between two models of identical architecture, they may have landed on quite different internal representations if the training data recipe was substantially different. But it would be fun to experiment with. | ||