Remix.run Logo
convolvatron 4 hours ago

you're right. there are a couple explanations that might have some merit looking at it from the device perspective. one is the the underlying block size is really large, so that looks like a very large cache line that a sequential scan will always hit. its also very likely that there are prefetchers running to try and hide the latency.