Remix clone Hacker News

new | show | ask | jobs Github

	▲	rohany 13 hours ago
		> I always assumed that when one warp waits for results from a long latency instruction, another warp, potentially from another block can be scheduled in. Yes, that is correct. However, most MMA-style kernels that utilize the Tensor Core usually need enough resources per block that only 1 block fits on each SM.