Remix.run Logo
Matrix Core Programming on AMD CDNA Architecture(rocm.blogs.amd.com)
11 points by salykova 5 days ago | 1 comments
phkahler 4 minutes ago | parent [-]

So from CDNA3 to 4 they doubled fp16 and fp8 performance but cut fp32 and fp64 by half?

Wonder why the regression on non-AI workloads?