Remix clone Hacker News

new | show | ask | jobs Github

	▲	AugSun 12 hours ago
		"We can run your dumbed down models faster": #The use of NVFP4 results in a 3.5x reduction in model memory footprint relative to FP16 and a 1.8x reduction compared to FP8, while maintaining model accuracy with less than 1% degradation on key language modeling tasks for some models.