If its really more expensive per token, it might have more parameters and is then able to hold more context/scope of code.
Rumors say it has 10 trillion parameter vs. 1 trillion.