Remix.run Logo
CapsAdmin 14 days ago

Slightly related, I had a go at doing llama 3 inference in luajit using cuda as one compute backend for just doing matrix multiplication

https://github.com/CapsAdmin/luajit-llama3/blob/main/compute...

While obviously not complete, it was less than I thought was needed.

It was a bit annoying trying to figure out which version of the function (_v2 suffix) I have to use for which driver I was running.

Also sometimes a bit annoying is the stateful nature of the api. Very similar to opengl. Hard to debug at times as to why something refuse to compile.

andrewmcwatters 14 days ago | parent [-]

Neat, thanks for sharing!