Remix clone Hacker News

new | show | ask | jobs Github

	▲	littlestymaar 7 hours ago
		No, GP is excessively restrictive. Llama.cpp supports RAM offloading out of the box. It's going to be slower than if you put everything on your GPU but it would work. And if it's too slow for your taste you can try the quantized version (some Q3 variant should fit) and see how well it works for you.