| ▲ | redox99 2 hours ago | |
In the 90s you didn't have norm layers, residuals, attention, and some more. So you're missing a lot of the building blocks that make LLMs. It's not a matter of just having the compute. | ||
| ▲ | sirsinsalot 34 minutes ago | parent [-] | |
I think the attention mechanism is so simple but so revolutionary that people forget it. Like the best leaps in thinking, once it is made, is is immediately obvious and intuitive. | ||