Remix.run Logo
adefa 2 hours ago

Benchmarks using DGX Spark on vLLM 0.15.1.dev0+gf17644344

  FP8: https://huggingface.co/Qwen/Qwen3-Coder-Next-FP8

  Sequential (single request)

    Prompt     Gen     Prompt Processing    Token Gen
    Tokens     Tokens  (tokens/sec)         (tokens/sec)
    ------     ------  -----------------    -----------
       521        49            3,157            44.2
     1,033        83            3,917            43.7
     2,057        77            3,937            43.6
     4,105        77            4,453            43.2
     8,201        77            4,710            42.2

  Parallel (concurrent requests)

    pp4096+tg128 (4K context, 128 gen):

     n    t/s
    --    ----
     1    28.5
     2    39.0
     4    50.4
     8    57.5
    16    61.4
    32    62.0

    pp8192+tg128 (8K context, 128 gen):

     n    t/s
    --    ----
     1    21.6
     2    27.1
     4    31.9
     8    32.7
    16    33.7
    32    31.7