Remix.run Logo
redox99 2 hours ago

In the 90s you didn't have norm layers, residuals, attention, and some more.

So you're missing a lot of the building blocks that make LLMs. It's not a matter of just having the compute.

sirsinsalot 34 minutes ago | parent [-]

I think the attention mechanism is so simple but so revolutionary that people forget it.

Like the best leaps in thinking, once it is made, is is immediately obvious and intuitive.