The more conservative version of this is that they'd want distilled models even if only as a speculative decoder to stick in front of the main model. That's an obvious optimisation to make.