Remix clone Hacker News

new | show | ask | jobs Github

	▲	banana_giraffe 13 hours ago
		A quick benchmark using float32 copies using torch cuda->cuda copies, comparing some random machines: `Raptor Lake + 5080: 380.63 GB/s Raptor Lake (CPU for reference): 20.41 GB/s GB10 (DGX Spark): 116.14 GB/s GH200: 1697.39 GB/s` This is a "eh, it works" benchmarks, but should give you a feel for the relative performance of the different systems. In practice, this means I can get something like 55 tokens a sec running a larger model like gpt-oss-120b-Q8_0 on the DGX Spark.
	▲	ekropotin 12 hours ago \| parent [-]
		Nice! Thanks for that. 55 t/s is much better than I could expect.