Remix clone Hacker News

new | show | ask | jobs Github

	▲	super_mario 6 hours ago
		I run gpt-oss 120b model on ollama (the model is about 65 GB on disk) with 128k context size (the model is super optimized and only uses 4.8 GB of additional RAM for KV cache at this context size) on M4 Max 128 GB RAM Mac Studio and I get 65 tokens/s.
	▲	abhikul0 5 hours ago \| parent [-]
		Have you tried the dense(27B,9B) Qwen3.5 models? Or any diffusion models (Flux Klein, Zimage)? I'm trying to gauge how much of a perf boost I'd get upgrading from an m3 pro. For reference: \| model \| size \| params \| backend \| threads \| test \| t/s \| \| ------------------------------ \| ---------: \| ---------: \| ---------- \| ------: \| --------------: \| -------------------: \| \| qwen35 ?B Q5_K - Medium \| 6.12 GiB \| 8.95 B \| MTL,BLAS \| 6 \| pp512 \| 288.90 ± 0.67 \| \| qwen35 ?B Q5_K - Medium \| 6.12 GiB \| 8.95 B \| MTL,BLAS \| 6 \| tg128 \| 16.58 ± 0.05 \| \| model \| size \| params \| backend \| threads \| test \| t/s \| \| ------------------------------ \| ---------: \| ---------: \| ---------- \| ------: \| --------------: \| -------------------: \| \| gpt-oss 20B MXFP4 MoE \| 11.27 GiB \| 20.91 B \| MTL,BLAS \| 6 \| pp512 \| 615.94 ± 2.23 \| \| gpt-oss 20B MXFP4 MoE \| 11.27 GiB \| 20.91 B \| MTL,BLAS \| 6 \| tg128 \| 42.85 ± 0.61 \| Klein 4B completes a 1024px generation in 72seconds.