We are on tiny 1-5T parameter models with local power stations.

We can reach Q models just by throwing resources at it. That’s a million times current B models.

Is this a known or quantifiable thing? I thought that the limit had already been determined i.e. the existing models top out and at some point it doesn't matter how much time or energy you let the model consume, it won't improve the result. And with regards to training parameters, I thought we were equally limited there, e.g. the existing models can't benefit from a larger parameter space.

I was under the impression that improvements are arriving via how the models are trained and how model prompting context is constructed, rather than just by how much data or how much energy is spent searching over the model space for a particular prompt.

Is there some evidence that we have not reached a pleateau with just resource consumption on existing models?

	▲	int_19h 9 days ago \| parent [-]
		The existing models "top out" not because they don't get better, but because it is uneconomical. What we do know is that a model "tops out" wrt training data - that is, for a model of a given size, there's only so much training data you can squeeze into the set before you stop seeing gains. But conversely it means that if you already have a model of say 1 Ttok that is "trained to capacity", then a model of 2 TTok needs roughly twice as much training data to fully utilize all those weights. Which means that the cost of training it is not 2x but 4x (twice as many params x twice as many tokens). And then of course serving it is 2x more expensive, but even with optimal training the gains aren't 2x. So it very quickly becomes uneconomical. A good example of that kind of model is (was) GPT-4.5. The prices and the consequent lack of demand show why companies don't really do that sort of thing anymore. But no, there's no evidence of a plateau as such. I'm not sure what "evidence that we have not reached a plateau" would even look like.

▲

sterlind 9 days ago | parent | prev | next [-]

what is a B model vs. a Q model? what do these letters mean?

	▲	brador 9 days ago \| parent [-]
		B Billion parameter, T trillion, Q Quadrillion.

▲

jazzyjackson 9 days ago | parent | prev [-]

You cannot think fast enough when your wires are kilometers long. The only way up is in, and silicon transistors just cannot compete with density with biologic brains, ergo, super intelligence is a pipe dream

	▲	fc417fc802 9 days ago \| parent [-]
		Baseless assertions. Fab tech continues to improve. There's no reason ML model internals have to be strictly serial - in fact we're already seeing some shifts away from that.