Remix clone Hacker News

new | show | ask | jobs Github

	▲	emanuele-em 9 hours ago
		The finding that naive single-op benchmarks overestimate dispatch cost by ~20x is wild. Curious how much the torch-webgpu backend could close the gap with CUDA if you went aggressive on kernel fusion, 53% improvement on Vulkan already is significant. Any plans to try wgsl-level custom kernels?
	▲	yu3zhou4 8 hours ago \| parent [-]
		Honestly there is a lot for room of improvement in torch-webgpu for performance. Needs involvement of community but the opportunities are definitely there