Remix.run Logo
genpfault a day ago

llama.cpp (b8642) auto-fits ~200k context on this 24GB RX 7900 XTX & it shows a solid 100+ tok/s ("S_TG t/s") on the first 32k of it, nice!

    ./llama-batched-bench -hf unsloth/gemma-4-26B-A4B-it-GGUF:UD-Q4_K_XL \
    -npp 1000,2000,4000,8000,16000,32000,64000,96000,128000 -ntg 128 -npl 1 -c 0
    |    PP |     TG |    B |   N_KV |   T_PP s | S_PP t/s |   T_TG s | S_TG t/s |      T s |    S t/s |
    |-------|--------|------|--------|----------|----------|----------|----------|----------|----------|
    |  1000 |    128 |    1 |   1128 |    0.416 |  2404.87 |    1.064 |   120.29 |    1.480 |   762.20 |
    |  2000 |    128 |    1 |   2128 |    0.755 |  2649.86 |    1.075 |   119.04 |    1.830 |  1162.83 |
    |  4000 |    128 |    1 |   4128 |    1.501 |  2665.72 |    1.093 |   117.08 |    2.594 |  1591.49 |
    |  8000 |    128 |    1 |   8128 |    3.142 |  2545.85 |    1.114 |   114.87 |    4.257 |  1909.47 |
    | 16000 |    128 |    1 |  16128 |    6.908 |  2316.00 |    1.189 |   107.65 |    8.097 |  1991.73 |
    | 32000 |    128 |    1 |  32128 |   16.382 |  1953.31 |    1.278 |   100.12 |   17.661 |  1819.16 |
    | 64000 |    128 |    1 |  64128 |   43.427 |  1473.74 |    1.453 |    88.12 |   44.879 |  1428.89 |
    | 96000 |    128 |    1 |  96128 |   82.227 |  1167.50 |    1.623 |    78.86 |   83.850 |  1146.42 |
    |128000 |    128 |    1 | 128128 |  133.237 |   960.69 |    1.797 |    71.25 |  135.034 |   948.86 |
danielhanchen a day ago | parent | next [-]

Oh nice that's pretty good!

spwa4 13 hours ago | parent | prev [-]

~50 tok/s on M1 Max 64Gb