Well-Read Students Learn Better: On the Importance of Pre-training Compact Models
Related paper that's a good read: https://arxiv.org/abs/1908.08962