Remix clone Hacker News

new | show | ask | jobs Github

	▲	geerlingguy 2 days ago
		Some more benchmarking, and with larger outputs (like writing an entire relatively complex TODO list app) it seems to go down to 4-6 tokens/s. Still impressive.
	▲	geerlingguy 2 days ago \| parent [-]
		Decided to run an actual llama-bench run and let it go for the hour or two it needs. I'm posting my full results here (https://github.com/geerlingguy/ai-benchmarks/issues/47), but 8-10 t/s pp, and 7.99 t/s tg128, this is on a Pi 5 with no overclocking. Could probably increase the numbers slightly with an overclock. You need to have a fan/heatsink to get that speed of course, it's maxing out the CPU for the entire time.