Remix clone Hacker News

new | show | ask | jobs Github

	▲	subharmonicon a day ago
		The blog post is about using an NVIDIA-specific tensor core API that they have built to get good performance. Modular has been pushing the notion that they are building technology that allows writing HW-vendor neutral solutions so that users can break free of NVIDIA's hold on high performance kernels. From their own writing: > We want a unified, programmable system (one small binary!) that can scale across architectures from multiple vendors—while providing industry-leading performance on the most widely used GPUs (and CPUs).
	▲	totalperspectiv a day ago \| parent [-]
		They allow you to write a kernel for Nvidia, or AMD, that can take full advantage of the Hardware of either one, then throw a compile time if-statement in there to switch which kernel to use based on the hardware available. So, you can support either vendor with as-good-vendor-library performance. That’s not lock-in to me at least. It’s not as good as the compiler being able to just magically produce optimized kernels for arbitrary hardware though, fully agree there. But it’s a big step forward from Cuda/HIP.