The article focuses on compute performance but I wonder if that was ever the bottleneck considering the memory bandwidth involved.