I don't have a source for this (there's probably no sources from anything back then) but anecdotally, someone at an AI/ML talk said they just added more data and quality went up. Doubling the data doubled the quality. With other breakthroughs, people saw diminishing gains. It's sort of why Sam back then tweeted that he expected the amount of intelligence to double every N years.

I have the feeling they kept on this until GPT-4o (which was a different kind of data).

▲

robrenaud 4 days ago | parent [-]

The input size to output quality mapping is not linear. This is why we are in the regime of "build nuclear power plants to power datacenters". Fixed size improvements in loss require exponential increases in parameters/compute/data.

	▲	brookst 4 days ago \| parent [-]
		Most of the reason we are re-commissioning a nuclear power plant is demand for quantity, not quality. If demand for compute had scaled this fast in the 1970’s, the sudden need for billions of CPUs would not have disproven Moore’s law. It is also true that mere doubling of training data quantity does not double output quality, but that’s orthogonal to power demand at inference time. Even if output quality doubled in that case, it would just mean that much more demand and therefore power needs.