Remix.run Logo
Show HN: Optimizing DeepSeek's NSA for TPUs – A Kernel Worklog(henryhmko.github.io)
1 points by henryhmko 16 hours ago

I enjoyed reading DeepSeek's Natively Sparse Attention(NSA) and implemented a JAX version and Pallas Kernel(JAX's kernel language).

There's a Google Colab link to follow along at the top of the blogpost.

Enjoy!