As a user, “fast” is almost the last thing I want from a model.
I suspect AI companies try to promote fast because it’s really a euphemism for “less inference compute” which is the real metric they would like to optimize.