▲ | bfung 3 days ago | |
Yep, the Attention mechanism in the Transformer arch is pretty good. Probably need another cycle of similar breakthrough in model engineering before this more complex neural network gets a step function better. Moar data ain’t gonna help. The human brain is the proof: it doesnt need the internet’s worth of data to become good (nor all that much energy). |