Remix clone Hacker News

new | show | ask | jobs Github

	▲	liuliu 4 days ago
		Both uses cublas under the hood. So I think it is similar for prefilling (of course, this framework is too early and don't have FP16 / BF16 support for GEMM it seems). Hand-roll gemv is faster for token generation hence llama.cpp is better.
	▲	kajecounterhack 3 days ago \| parent [-]
		Unrelated: my man, I loved your C vision library back in the day.