Remix.run Logo
int_19h 3 days ago

This is Mac Studio M1 Ultra with 128Gb of RAM.

  > llama-bench -m ./gpt-oss-120b-MXFP4-00001-of-00002.gguf -ngl 999 -fa 1 --mmap 0 -p 65536 -b 4096 -ub 4096       
                                                                                             
  | model                          |       size |     params | backend    | threads | n_batch | n_ubatch | fa | mmap |            test |                  t/s |
  | ------------------------------ | ---------: | ---------: | ---------- | ------: | ------: | -------: | -: | ---: | --------------: | -------------------: |
  | gpt-oss 120B MXFP4 MoE         |  59.02 GiB |   116.83 B | Metal,BLAS |      16 |    4096 |     4096 |  1 |    0 |         pp65536 |       392.37 ± 43.91 |
  | gpt-oss 120B MXFP4 MoE         |  59.02 GiB |   116.83 B | Metal,BLAS |      16 |    4096 |     4096 |  1 |    0 |           tg128 |         65.47 ± 0.08 |
  
  build: a0e13dcb (6470)
EnPissant 2 days ago | parent [-]

Thanks. That’s better than I expected. It's only 8.3x worse than a 5090 + CPU: 167s latency.