Thanks for posting the performance numbers from your own validation. 6-7 tokens/sec is quite remarkable for the hardware.

▲

geerlingguy 2 days ago | parent [-]

Some more benchmarking, and with larger outputs (like writing an entire relatively complex TODO list app) it seems to go down to 4-6 tokens/s. Still impressive.

	▲	geerlingguy 2 days ago \| parent [-]
		Decided to run an actual llama-bench run and let it go for the hour or two it needs. I'm posting my full results here (https://github.com/geerlingguy/ai-benchmarks/issues/47), but 8-10 t/s pp, and 7.99 t/s tg128, this is on a Pi 5 with no overclocking. Could probably increase the numbers slightly with an overclock. You need to have a fan/heatsink to get that speed of course, it's maxing out the CPU for the entire time.