Remix clone Hacker News

new | show | ask | jobs Github

	▲	int_19h 3 days ago
		This is Mac Studio M1 Ultra with 128Gb of RAM. > llama-bench -m ./gpt-oss-120b-MXFP4-00001-of-00002.gguf -ngl 999 -fa 1 --mmap 0 -p 65536 -b 4096 -ub 4096 \| model \| size \| params \| backend \| threads \| n_batch \| n_ubatch \| fa \| mmap \| test \| t/s \| \| ------------------------------ \| ---------: \| ---------: \| ---------- \| ------: \| ------: \| -------: \| -: \| ---: \| --------------: \| -------------------: \| \| gpt-oss 120B MXFP4 MoE \| 59.02 GiB \| 116.83 B \| Metal,BLAS \| 16 \| 4096 \| 4096 \| 1 \| 0 \| pp65536 \| 392.37 ± 43.91 \| \| gpt-oss 120B MXFP4 MoE \| 59.02 GiB \| 116.83 B \| Metal,BLAS \| 16 \| 4096 \| 4096 \| 1 \| 0 \| tg128 \| 65.47 ± 0.08 \| build: a0e13dcb (6470)
	▲	EnPissant 2 days ago \| parent [-]
		Thanks. That’s better than I expected. It's only 8.3x worse than a 5090 + CPU: 167s latency.