▲ | kevindamm 4 days ago | |
Transformers can train models with much larger parameter sizes compared to other model architectures (with the same amount of compute and time), so it has an evident advantage in terms of being able to scale. Whether scaling the models up to multi-billion parameters would eventually pay out was still a bet but it wasn't a wild bet out of nowhere. |