| ▲ | santiagobasulto 3 hours ago | |
I don't think it's the same. It's a similar concept, but Gemma is using just a linear projection, which I assume is a lot faster. The developer guide has more details: https://developers.googleblog.com/gemma-4-12b-the-developer-...
the "single matmul" is the key here, I haven't tried it, but it's probably pretty fast and memory efficient. | ||