Remix.run Logo
onename 6 days ago

Have you checked out this video from 3Blue1Brown that talks bit about transformers?

https://youtu.be/wjZofJX0v4M

imtringued 5 days ago | parent | next [-]

I personally would rather recommend people to just look at these architectural diagrams [0] and try to understand them. There is the caveat that they do not show how attention works. For that you need to understand softmax(QK^T)V and multi head attention being a repetition of this multiple times. GQA, MHA, etc just messes around with reusing Q or K or V in clever ways.

[0] https://huggingface.co/blog/vtabbott/mixtral

rhdunn 5 days ago | parent | prev | next [-]

There's also various videos by Welch Labs that are very good. -- https://www.youtube.com/@WelchLabsVideo/videos

CGMthrowaway 6 days ago | parent | prev [-]

I've seen it but I don't believe I've watched it all the way through. I will now