A system that self-updates its weights is so obvious the only question is who will be the first to get there?

soulofmischief 3 days ago | parent | next [-]

It's not always as useful as you think from the perspective of a business trying to sell an automated service to users who expect reliability. Now you have to worry about waking up in the middle of the night to rewind your model to a last known good state, leading to real data loss as far as users are concerned.

Data and functionality become entwined and basically you have to keep these systems on tight rails so that you can reason about their efficacy and performance, because any surgery on functionality might affect learned data, or worse, even damage a memory.

It's going to take a long time to solve these problems.

▲

danenania 3 days ago | parent | prev | next [-]

I’m not sure that self-updating weights is really analogous to “continuous learning” as humans do it. A memory data structure that the model can search efficiently might be a lot closer.

Self-updating weights could be more like epigenetics.

▲

Jensson 3 days ago | parent | next [-]

Human neurons are self updating though, we aren't running on our genes each cell is using our genes to determine how to connect to other cells and then the cell learns how to process some information there based on what it hears from its connected cells.

So, genes would be a meta model that then updates weights in the real model so it can learn how to process new kinds of things, and for stuff like facts you can use an external memory just like humans does.

Without updating the weights in the model you will never be able to learn to process new things like a new kind of math etc, since you learn that not by memorizing facts but by making new models for it.

▲

HarHarVeryFunny 2 days ago | parent | prev | next [-]

There's a difference between memory and learning.

Would you rather your illness was diagnosed by a doctor or by a plumber with access to a stack of medical books ?

Learning is about assimilating lots of different sources of information, reconciling the differences, trying things out for yourself, learning from your mistakes, being curious about your knowledge gaps and contradictions, and ultimately learning to correctly predict outcomes/actions based on everything you have learnt.

You will soon see the difference in action as Anthropic apparently agree with you that memory can replace learning, and are going to be relying on LLMs with longer compressed context (i.e. memory) in place of ability to learn. I guess this'll be Anthropic's promised 2027 "drop-in replacement remote worker" - not an actual plumber unfortunately (no AGI), but an LLM with a stack of your company's onboarding material. It'll have perfect (well, "compressed") recall of everything you've tried to teach it, or complained about, but will have learnt nothing from that.

▲

danenania 2 days ago | parent [-]

I think my point is that when the doctor diagnoses you, she often doesn’t do so immediately. She is spending time thinking it through, and as part of that process is retrieving various pieces of relevant information from her memory (both long term and short term).

I think this may be closer to an agentic, iterative search (ala claude code) than direct inference using continuously updated weights. If it was the latter, there would be no process of thinking it through or trying to recall relevant details, past cases, papers she read years ago, and so on; the diagnosis would just pop out instantaneously.

	▲	HarHarVeryFunny 2 days ago \| parent [-]
		Yes, but I think a key part of learning is experimentation and the feedback loop of being wrong. An agent, or doctor, may be reasoning over the problem they are presented with, combining past learning with additional sources of memorized or problem-specific data, but in that moment it's their personal expertise/learning that will determine how successful they are with this reasoning process and ability to apply the reference material to the matter at hand (cf the plumber, who with all the time in the world just doesn't have the learning to make good use of the reference books). I think there is also a subtle problem, not often discussed, that to act successfully, the underlying learning in choosing how to act has to have come from personal experience. It's basically the difference between being book smart and having personal experience, but in the case of an LLM also applies to experience-based reasoning it may have been trained on. The problem is that when the LLM acts, what is in it's head (context/weights) isn't the same as what was in the head of the expert whose reasoning it may be trying to apply, so it may be trying to apply reasoning outside of the context that made it valid. How you go from being book smart, and having heard other people's advice and reasoning, to being an expert yourself is by personal practice and learning - learning how to act based on what is in your own head.

▲

imtringued 2 days ago | parent | prev [-]

In spiking neural networks, the model weights are equivalent to dendrites/synapses, which can form anew and decay during your lifetime.

▲

HarHarVeryFunny 2 days ago | parent | prev | next [-]

Sure, it's obvious, but it's only one of the missing pieces required for brain-like AGI, and really upends the whole LLM-as-AI way of doing things.

Runtime incremental learning is still going to be based on prediction failure, but now it's no longer failure to predict the training set, but rather requires closing the loop and having (multi-modal) runtime "sensory" feedback - what were the real-world results of the action the AGI just predicted (generated)? This is no longer an auto-regressive model where you can just generate (act) by feeding the model's own output back in as input, but instead you now need to continually gather external feedback to feed back into your new incremental learning algorithm.

For a multi-modal model the feedback would have to include image/video/audio data as well as text, but even if initial implementations of incremental learning systems restricted themselves to text it still turns the whole LLM-based way of interacting with the model on it's head - the model generates text-based actions to throw out into the world, and you now need to gather the text-based future feedback to those actions. With chat the feedback is more immediate, but with something like software development far more nebulous - the model makes a code edit, and the feedback only comes later when compiling, running, debugging, etc, or maybe when trying to refactor or extend the architecture in the future. In corporate use the response to an AGI-generated e-mail or message might come in many delayed forms, with these then needing to be anticipated, captured, and fed back into the model.

Once you've replaced the simple LLM prompt-response mode of interaction with one based on continual real-world feedback, and designed the new incremental (Bayesian?) learning algorithm to replace SGD, maybe the next question is what model is being updated, and where does this happen? It's not at all clear that the idea of a single shared (between all users) model will work when you have millions of model instances all simultaneously doing different things and receiving different feedback on different timescales... Maybe the incremental learning now needs to be applied to a user-specific model instance (perhaps with some attempt to later share & re-distribute whatever it has learnt), even if that is still cloud based.

So... a lot of very fundamental changes need to be made, just to support self-learning and self-updates, and we haven't even discussed all the other equally obvious differences between LLMs and a full cognitive architecture that would be needed to support more human-like AGI.

▲

tmountain 3 days ago | parent | prev | next [-]

I’m no expert, but it seems like self updating weights requires a grounded understanding of the underlying subject matter, and this seems like a problem current LLM systems.

▲

imtringued 2 days ago | parent | prev | next [-]

I wonder when there will be proofs in theoretical computer science that an algorithm is AGI-complete, the same way there are proofs of NP-completeness.

Conjecture: A system that self updates its weights according to a series of objective functions, but does not suffer from catastrophic forgetting (performance only degrades due to capacity limits, rather than from switching tasks) is AGI-complete.

Why? Because it could learn literally anything!

▲

emporas 3 days ago | parent | prev | next [-]

But then it is a specialized intelligence, specialized to altering it's weights. Reinforcement Learning doesn't work as well when the goal is not easily defined. It does wonders for games, but anything else?

Someone has to specify the goals, a human operator or another A.I. The second A.I. better be an A.G.I. itself, otherwise it's goals will not be significant enough for us to care.

▲

fuckaj 3 days ago | parent | prev | next [-]

True. In the same way as making noises down a telephone line is the obvious way to build a million dollar business.

▲

3 days ago | parent | prev [-]

[deleted]