Remix clone Hacker News

	▲	CapsAdmin 14 days ago
		Slightly related, I had a go at doing llama 3 inference in luajit using cuda as one compute backend for just doing matrix multiplication https://github.com/CapsAdmin/luajit-llama3/blob/main/compute... While obviously not complete, it was less than I thought was needed. It was a bit annoying trying to figure out which version of the function (_v2 suffix) I have to use for which driver I was running. Also sometimes a bit annoying is the stateful nature of the api. Very similar to opengl. Hard to debug at times as to why something refuse to compile.
	▲	andrewmcwatters 14 days ago \| parent [-]
		Neat, thanks for sharing!