These models never cost billions to train and I doubt the final training run for models like GPT-4 cost more than 8 figures. 6 million is definitely cheaper and I would attribute that to distillation.