Remix clone Hacker News

new | show | ask | jobs Github

	▲	cafkafk 2 hours ago
		> (purple on black is really hard to read) Noted, and agree (it looks like it has also already been clicked, which I dislike). I honestly I need to redo the themes. > You say it runs "at reading speed". Have you benchmarked it? At some point a few weeks ago, yes I think so, but I didn't write it down for some reason... so I'll have to find a time when it's not busy and do it again without a noisy system. Right now the system is noisy, but that said doing it like this: llama-cli --model gemma-4-26B-A4B-it-Q8_0.gguf --model-draft gemma-4-26B-A4B-t-assistant-GGUF/wikitext-2-raw_ik-llama-mtp_drafter-conservative/gemma-4-26B-A4B-it-assistant-Q8_0.gguf --spec-type mtp --draft-max 3 --draft-p-min 0.0 --color -sm graph -smgs -sas -mea 256 --split-mode-f32 --temp 0.7 --cpu-moe -t 8 --flash-attn on --mla-use 3 --merge-up-gate-experts --special --mlock --run-time-repack --spec-autotune --no-kv-offload --parallel 8 --jinja -p "Why is the sky blue?" -n 128 Gives: `llama_print_timings: load time = 83911.65 ms llama_print_timings: sample time = 26.99 ms / 128 runs ( 0.21 ms per token, 4742.15 tokens per second) llama_print_timings: prompt eval time = 343.41 ms / 7 tokens ( 49.06 ms per token, 20.38 tokens per second) llama_print_timings: eval time = 10639.36 ms / 127 runs ( 83.77 ms per token, 11.94 tokens per second) llama_print_timings: total time = 11114.98 ms / 134 tokens` So 11.94 tokens per second while it's also playing binary cache and CI builder. When I do it properly, I'll add it to the blog as well!
	▲	anon-3988 10 minutes ago \| parent [-]
		I am pretty sure llamacpp have their own benchmarking binary that you can use.