▲ | robrenaud 4 days ago | |
The input size to output quality mapping is not linear. This is why we are in the regime of "build nuclear power plants to power datacenters". Fixed size improvements in loss require exponential increases in parameters/compute/data. | ||
▲ | brookst 4 days ago | parent [-] | |
Most of the reason we are re-commissioning a nuclear power plant is demand for quantity, not quality. If demand for compute had scaled this fast in the 1970’s, the sudden need for billions of CPUs would not have disproven Moore’s law. It is also true that mere doubling of training data quantity does not double output quality, but that’s orthogonal to power demand at inference time. Even if output quality doubled in that case, it would just mean that much more demand and therefore power needs. |