| ▲ | 0x3f 4 hours ago | |||||||||||||||||||||||||
If model arch doesn't matter much how come transformers changed everything? | ||||||||||||||||||||||||||
| ▲ | visarga 4 hours ago | parent [-] | |||||||||||||||||||||||||
Luck. RNNs can do it just as good, Mamba, S4, etc - for a given budget of compute and data. The larger the model the less architecture makes a difference. It will learn in any of the 10,000 variations that have been tried, and come about 10-15% close to the best. What you need is a data loop, or a data source of exceptional quality and size, data has more leverage. Architecture games reflect more on efficiency, some method can be 10x more efficient than another. | ||||||||||||||||||||||||||
| ||||||||||||||||||||||||||