Remix clone Hacker News

new | show | ask | jobs Github

	▲	zettabomb 5 days ago
		llama.cpp has built-in support for doing this, and it works quite well. Lots of people running LLMs on limited local hardware use it.
	▲	EnPissant 4 days ago \| parent [-]
		llama.cpp has support for running some of or all of the layers on the CPU. It does not swap them into the GPU as needed.