| ▲ | geerlingguy 2 days ago | |
Some more benchmarking, and with larger outputs (like writing an entire relatively complex TODO list app) it seems to go down to 4-6 tokens/s. Still impressive. | ||
| ▲ | geerlingguy 2 days ago | parent [-] | |
Decided to run an actual llama-bench run and let it go for the hour or two it needs. I'm posting my full results here (https://github.com/geerlingguy/ai-benchmarks/issues/47), but 8-10 t/s pp, and 7.99 t/s tg128, this is on a Pi 5 with no overclocking. Could probably increase the numbers slightly with an overclock. You need to have a fan/heatsink to get that speed of course, it's maxing out the CPU for the entire time. | ||