It makes complete sense to me: highly-specific models don't have much commercial value, and at-scale llm training favours generalism.