| ▲ | cafkafk 2 hours ago | |
> (purple on black is really hard to read) Noted, and agree (it looks like it has also already been clicked, which I dislike). I honestly I need to redo the themes. > You say it runs "at reading speed". Have you benchmarked it? At some point a few weeks ago, yes I think so, but I didn't write it down for some reason... so I'll have to find a time when it's not busy and do it again without a noisy system. Right now the system is noisy, but that said doing it like this: llama-cli --model gemma-4-26B-A4B-it-Q8_0.gguf --model-draft gemma-4-26B-A4B-t-assistant-GGUF/wikitext-2-raw_ik-llama-mtp_drafter-conservative/gemma-4-26B-A4B-it-assistant-Q8_0.gguf --spec-type mtp --draft-max 3 --draft-p-min 0.0 --color -sm graph -smgs -sas -mea 256 --split-mode-f32 --temp 0.7 --cpu-moe -t 8 --flash-attn on --mla-use 3 --merge-up-gate-experts --special --mlock --run-time-repack --spec-autotune --no-kv-offload --parallel 8 --jinja -p "Why is the sky blue?" -n 128 Gives:
So 11.94 tokens per second while it's also playing binary cache and CI builder.When I do it properly, I'll add it to the blog as well! | ||
| ▲ | anon-3988 10 minutes ago | parent [-] | |
I am pretty sure llamacpp have their own benchmarking binary that you can use. | ||