▲ | Show HN: Optimizing DeepSeek's NSA for TPUs – A Kernel Worklog(henryhmko.github.io) | |
1 points by henryhmko 16 hours ago | ||
I enjoyed reading DeepSeek's Natively Sparse Attention(NSA) and implemented a JAX version and Pallas Kernel(JAX's kernel language). There's a Google Colab link to follow along at the top of the blogpost. Enjoy! |