▲ | CapsAdmin 14 days ago | |
Slightly related, I had a go at doing llama 3 inference in luajit using cuda as one compute backend for just doing matrix multiplication https://github.com/CapsAdmin/luajit-llama3/blob/main/compute... While obviously not complete, it was less than I thought was needed. It was a bit annoying trying to figure out which version of the function (_v2 suffix) I have to use for which driver I was running. Also sometimes a bit annoying is the stateful nature of the api. Very similar to opengl. Hard to debug at times as to why something refuse to compile. | ||
▲ | andrewmcwatters 14 days ago | parent [-] | |
Neat, thanks for sharing! |