| ▲ | brador 10 days ago | |||||||
We are on tiny 1-5T parameter models with local power stations. We can reach Q models just by throwing resources at it. That’s a million times current B models. | ||||||||
| ▲ | bdamm 10 days ago | parent | next [-] | |||||||
Is this a known or quantifiable thing? I thought that the limit had already been determined i.e. the existing models top out and at some point it doesn't matter how much time or energy you let the model consume, it won't improve the result. And with regards to training parameters, I thought we were equally limited there, e.g. the existing models can't benefit from a larger parameter space. I was under the impression that improvements are arriving via how the models are trained and how model prompting context is constructed, rather than just by how much data or how much energy is spent searching over the model space for a particular prompt. Is there some evidence that we have not reached a pleateau with just resource consumption on existing models? | ||||||||
| ||||||||
| ▲ | sterlind 9 days ago | parent | prev | next [-] | |||||||
what is a B model vs. a Q model? what do these letters mean? | ||||||||
| ||||||||
| ▲ | jazzyjackson 9 days ago | parent | prev [-] | |||||||
You cannot think fast enough when your wires are kilometers long. The only way up is in, and silicon transistors just cannot compete with density with biologic brains, ergo, super intelligence is a pipe dream | ||||||||
| ||||||||