I believe their speedup is computed _assuming they can easily fix the correctness bugs in the kernels_.
In practice, with slight differences the model will feel almost lobotomized.