It’s because that’s what most resembles the bulk of the tasks it was being optimized for during pre-training.