Benchmarks using DGX Spark on vLLM 0.15.1.dev0+gf17644344
FP8: https://huggingface.co/Qwen/Qwen3-Coder-Next-FP8
Sequential (single request)
Prompt Gen Prompt Processing Token Gen
Tokens Tokens (tokens/sec) (tokens/sec)
------ ------ ----------------- -----------
521 49 3,157 44.2
1,033 83 3,917 43.7
2,057 77 3,937 43.6
4,105 77 4,453 43.2
8,201 77 4,710 42.2
Parallel (concurrent requests)
pp4096+tg128 (4K context, 128 gen):
n t/s
-- ----
1 28.5
2 39.0
4 50.4
8 57.5
16 61.4
32 62.0
pp8192+tg128 (8K context, 128 gen):
n t/s
-- ----
1 21.6
2 27.1
4 31.9
8 32.7
16 33.7
32 31.7