Remix clone Hacker News

new | show | ask | jobs Github

	▲	SmartestUnknown 15 hours ago
		Regarding 2x faster than pytorch being a condition for tinygrad to come out of alpha: Can they/someone else give more details as to what workloads pytorch is more than 2x slower than the hardware provides? Most of the papers use standard components and I assume pytorch is already pretty performant at implementing them at 50+% of extractable performance from typical GPUs. If they mean more esoteric stuff that requires writing custom kernels to get good performance out of the chips, then that's a different issue.