The LLM architectures we have now have reached their full potential already, so going further would require something completely different. It isn’t a matter of refining the existing tech, whereas the internet of 1997 is virtually technologically identical to what we have today. The real change has been sociological, not technological.

To make a car analogy; the current LLMs are not the early cars, but the most refined horse drawn carriages. No matter how much money is poured into them, you won’t find the future there.

▲

mkl 2 days ago | parent | next [-]

Dial-up modems reached their full 56kbps potential in 1997, and going further required something completely different. It happened naturally to satisfy demand, and was done by many of the same companies and people; the change was technological, not sociological.

I think we're probably still far from the full potential of LLMs, but I don't see any obstacles to developing and switching to something better.

	▲	volkl48 2 days ago \| parent \| next [-]
		I don't think that comparison works very well at all. We had plenty of options for better technologies both available and in planning, 56k modems were just the cost effective/lowest common denominator of their era. It's not nearly as clear that we have some sort of proven, workable ideas for where to go beyond LLMs.
	▲	geon 16 hours ago \| parent \| prev [-]
		> Dial-up modems reached their full 56kbps potential in 1997 That's simply not true. Modems were basically the same tech in the dsl era, and using light instead of electricity is a very gradual refinement. > we're probably still far from the full potential of LLMs Then how come the returns are so extremely diminishing? > I don't see any obstacles to developing and switching to something better. The obstacle is that it needs to be invented. There was nothing stopping newton from discovering relativity either. We simply have no idea what the road forward even looks like.

▲

Enginerrrd 2 days ago | parent | prev | next [-]

The current generation of LLM's have convinced me that we already have the compute and the data needed for AGI, we just likely need a new architecture. But I really think such an architecture could be right around the corner. It appears to me like the building blocks are there for it, it would just take someone with the right luck and genius to make it happen.

▲

visarga 2 days ago | parent | next [-]

> The current generation of LLM's have convinced me that we already have the compute and the data needed for AGI, we just likely need a new architecture.

I think this is one of the greatest fallacies surrounding LLMs. This one, and the other one - scaling compute!! The models are plenty fine, what they need is not better models, or more compute, they need better data, or better feedback to keep iterating until they reach the solution.

Take AlphaZero for example, it was a simple convolutional network, not great compared to LLMs, small relative recent models, and yet it beat the best of us at our own game. Why? Because it had unlimited environment access to play games against other variants of itself.

Same for the whole Alpha* family, AlphaStar, AlphaTensor, AlphaCode, AlphaGeometry and so on, trained with copious amounts of interactive feedback could reach top human level or surpass humans in specific domains.

What AI needs is feedback, environments, tools, real world interaction that exposes the limitations in the model and provides immediate help to overcome them. Not unlike human engineers and scientists - take their labs and experiments away and they can't discover shit.

It's also called the ideation-validation loop. AI can ideate, it needs validation from outside. That is why I insist the models are not the bottleneck.

	▲	geon 16 hours ago \| parent [-]
		For Alpha Zero, the "better data" was trivial. The environment of board games is extremely simplistic. It just can't be compared to language models. The problem with language is that there is no know correct answer. Everything is vague, ambiguous and open ended. How would we even implement feedback for that? So yes, we do need new models.

▲

netdevphoenix 2 days ago | parent | prev [-]

> The current generation of LLM's have convinced me that we already have the compute and the data needed for AGI, we just likely need a new architecture

This is likely true but not for the reasons you think about. This was arguably true 10 years ago too. A human brain uses 100 watts per day approx and unlike most models out there, the brain is ALWAYS in training mode. It has about 2 petabytes of storage.

In terms of raw capabilities, we have been there for a very long time.

The real challenge is finding the point where we can build something that is AGI level with the stuff we have. Because right now, we might have the compute and data needed for AGI but we might lack the tools needed to build a system that efficient. It's like a little dog trying to enter a fenced house, the closest path topologically between the dog and the house might not be accessible for that dog at that point because its current capabilities (short legs, inability to jump high or push through the fence standing in between) so while it is further topologically, a longer path topologically might be the closest path to reach the house.

In case it's not obvious, AGI is the house, we are the little dog and the fence represent current challenges to build AGI.

▲

Flashtoo 2 days ago | parent [-]

The notion that the brain uses less energy than an incandescent lightbulb and can store less data than YouTube does not mean we have had the compute and data needed to make AGI "for a very long time".

The human brain is not a 20-watt computer ("100 watts per day" is not right) that learns from scratch on 2 petabytes of data. State manipulations performed in the brain can be more efficient than what we do in silicon. More importantly, its internal workings are the result of billions of years of evolution, and continue to change over the course of our lives. The learning a human does over its lifetime is assisted greatly by the reality of the physical body and the ability to interact with the real world to the extent that our body allows. Even then, we do not learn from scratch. We go through a curriculum that has been refined over millennia, building on knowledge and skills that were cultivated by our ancestors.

An upper bound of compute needed to develop AGI that we can take from the human brain is not 20 watts and 2 petabytes of data, it is 4 billion years of evolution in a big and complex environment at molecular-level fidelity. Finding a tighter upper bound is left as an exercise for the reader.

▲

netdevphoenix 2 days ago | parent [-]

> it is 4 billion years of evolution in a big and complex environment at molecular-level fidelity. Finding a tighter upper bound is left as an exercise for the reader.

You have great points there and I agree. Only issue I take with your remark above. Surely, by your own definition, this is not true. Evolution by natural selection is not a deterministic process so 4 billion years is just one of many possible periods of time needed but not necessarily the longest or the shortest.

Also, re "The human brain is not a 20-watt computer ("100 watts per day" is not right)", I was merely saying that there exist an intelligence that consumes 20 watts per day. So it is possible to run an intelligence on that much energy per day. This and the compute bit do not refer to the training costs but to the running costs after all, it will be useless to hit AGI if we do not have enough energy or compute to run it for longer than half a millisecond or the means to increase the running time.

Obviously, the path to design and train AGI is going to take much more than that just like the human brain did but given that the path to the emergence of the human brain wasn't the most efficient given the inherent randomness in evolution natural selection there is no need to pretend that all the circumstances around the development of the human brain apply to us as our process isn't random at all nor is it parallel at a global scale.

	▲	Flashtoo 2 days ago \| parent \| next [-]
		> Evolution by natural selection is not a deterministic process so 4 billion years is just one of many possible periods of time needed but not necessarily the longest or the shortest. That's why I say that is an upper bound - we know that it _has_ happened under those circumstances, so the minimum time needed is not more than that. If we reran the simulation it could indeed very well be much faster. I agree that 20 watts can be enough to support intelligence and if we can figure out how to get there, it will take us much less time than a billion years. I also think that on the compute side for developing the AGI we should count all the PhD brains churning away at it right now :)
	▲	recursive 2 days ago \| parent \| prev [-]
		"watts per day" is just not a sensible metric. watts already has the time component built in. 20 watts is a rate of energy usage over time.

▲

ozgung 2 days ago | parent | prev | next [-]

> The LLM architectures we have now have reached their full potential already.

How do we know that?

	▲	efficax 2 days ago \| parent \| next [-]
		what we can say right now is that we've hit the point of diminishing returns and the only way we're going to get signicantly more capable models is through a technological advance that we cannot forsee (and that may not come for decades if it ever comes)
	▲	polynomial 2 days ago \| parent \| prev [-]
		Exactly. You're absolutely right to focus on that.

▲

tim333 2 days ago | parent | prev [-]

You could see some potential modifications. Already some are multimodal. You'd probably want something to change the weights as time goes on so they can learn. It might be more steam engines needing to be converted to petrol engines.