Remix clone Hacker News

	▲	timschmidt 19 hours ago
		llamafile contains specific optimizations for prompt processing using AVX512 for dealing with just this issue: https://justine.lol/matmul/ (about a 10x speedup over llama.cpp) Somewhere between 8 and 192 cores I'm sure there's enough AVX512 to get the job done. And we've managed to reinvent Intel's Larrabee / Knights concept. Sadly, the highly optimized AVX512 kernels of llamafile don't support these exotic floats yet as far as I know. Yes, energy efficiency per query will be terrible compared to a hyperscaler. However privacy will be perfect. Flexibility will be higher than other options - as running on the CPU is almost always possible. Even with new algorithms and experimental models.
	▲	ein0p 19 hours ago \| parent [-]
		At 192 cores you're way better off buying a Mac Studio, though.