Remix clone Hacker News

new | show | ask | jobs Github

	▲	connorturland 11 hours ago
		I finally extracted some useful signals about what results you can get on the DGX Station machines. A bit of news broke via AI engineer conference today. Would have preferred Kimi 2.7 Code numbers, but 2.5 was what I could get. Kimi 2.5, 1.1T params 40-50 tok/s total output across all users NVIDIA rep number; about 595GB model weights; we still need benchmark conditions Nemotron Ultra, 550B ~35 tok/s at concurrency 1; scales to 4-5 concurrent users NVIDIA rep number; useful because it includes a concurrency claim GLM-5.2-REAP, 504B ~60 tok/s Public 0xSero number from AI Engineer; Alec Fong says an earlier GLM NVFP4 attempt was ~25 tok/s; still missing exact quant, prefill, context, and memory residency/concurrency details I also learned a lot about what it costs and when it's shipping. Full writeup at the link