new | show | ask | jobs Github

cs702 2 hours ago

The problem is even more fundamental: Today's models stop learning once they're deployed to production.

There's pretraining, training, and finetuning, during which model parameters are updated.

Then there's inference, during which the model is frozen. "In-context learning" doesn't update the model.

We need models that keep on learning (updating their parameters) forever, online, all the time.

▲

raincole 8 minutes ago | parent | next [-]

> We need models that keep on learning (updating their parameters) forever, online, all the time.

Yeah, that's the guaranteed way to get MechaHilter in your latent space.

If the feedback loop is fast enough I think it would finally kill the internet (in the 'dead internet theory' sense). Perhaps it's better for everyone though.

▲

rabbitlord 4 minutes ago | parent | prev | next [-]

I think they can do in-context learning.

▲

embedding-shape an hour ago | parent | prev | next [-]

> We need models that keep on learning (updating their parameters) forever, online, all the time.

Do we need that? Today's models are already capable in lots of areas. Sure, they don't match up to what the uberhypers are talking up, but technology seldom does. Doesn't mean what's there already cannot be used in a better way, if they could stop jamming it into everything everywhere.

▲

derefr an hour ago | parent | prev | next [-]

Doesn't necessarily need to be online. As long as:

1. there's a way to take many transcripts of inference over a period, and convert/distil them together into an incremental-update training dataset (for memory, not for RLHF), that a model can be fine-tuned on as an offline batch process every day/week, such that a new version of the model can come out daily/weekly that hard-remembers everything you told it; and

2. in-context learning + external memory improves to the point that a model with the appropriate in-context "soft memories", behaves indistinguishably from a model that has had its weights updated to hard-remember the same info (at least when limited to the scope of the small amounts of memories that can be built up within a single day/week);

...then you get the same effect.

Why is this an interesting model? Because, at least to my understanding, this is already how organic brains work!

There's nothing to suggest that animals — even humans — are neuroplastic on a continuous basis. Rather, our short-term memory is seemingly stored as electrochemical "state" in our neurons (much like an LLM's context is "state", but more RNN "a two-neuron cycle makes a flip-flop"-y); and our actual physical synaptic connectivity only changes during "memory reconsolidation", a process that mostly occurs during REM sleep.

And indeed, we see the same exact problem in humans and other animals, where when we stay awake too long without REM sleep, our "soft memory" state buffer reaches capacity, and we become forgetful, both in the sense of not being able to immediately recall some of the things that happened to us since we last slept; and in the sense of later failing to persist some of the experiences we had since we last slept, when we do finally sleep. But this model also "works well enough" to be indistinguishable from remembering everything... in the limited scope of our being able to get a decent amount of REM sleep every night.

	▲	observationist an hour ago \| parent [-]
		It 100% needs to be online. Imagine you're trying to think about a new tabletop puzzle, and every time a puzzle piece leaves your direct field of view, you no longer know about that puzzle piece. You can try to keep all of the puzzle pieces within your direct field of view, but that divides your focus. You can hack that and make your field of view incredibly large, but that can potentially distort your sense of the relationships between things, their physical and cognitive magnitude. Bigger context isn't the answer, there's a missing fundamental structure and function to the overall architecture. What you need is memory, that works when you process and consume information, at the moment of consumption. If you meet a new person, you immediately memorize their face. If you enter a room, it's instantly learned and mapped in your mind. Without that, every time you blinked after meeting someone new, it'd be a total surprise to see what they looked like. You might never learn to recognize and remember faces at all. Or puzzle pieces. Or whatever the lack of online learning kept you from recognizing the value of persistent, instant integration into an existing world model. You can identify problems like this for any modality, including text, audio, tactile feedback, and so on. You absolutely, 100% need online, continuous learning in order to effectively deal with information at a human level for all the domains of competence that extend to generalizing out of distribution. It's probably not the last problem that needs solving before AGI, but it is definitely one of them, and there might only be a handful left. Mammals instantly, upon perceiving a novel environment, map it, without even having to consciously make the effort. Our brains operate in a continuous, plastic mode, for certain things. Not only that, it can be adapted to abstractions, and many of those automatic, reflexive functions evolved to handle navigation and such allow us to simulate the future and predict risk and reward over multiple arbitrary degrees of abstraction, sometimes in real time. https://www.nobelprize.org/uploads/2018/06/may-britt-moser-l...

▲

charcircuit an hour ago | parent | prev | next [-]

Models like Claude have been trained to update and reference memory for Claude Code (agent loops) independently and as a part of compacting context. Current models have been trained to keep learning after being deployed.

	▲	ra 16 minutes ago \| parent [-]
		yes but that's a very unsatisfactory definition of memory.

▲

4b11b4 2 hours ago | parent | prev [-]

I'm not sure if you want models perpetually updating weights. You might run into undesirable scenarios.

▲

cs702 2 hours ago | parent | next [-]

Our brains, which are organic neural networks, are constantly updating themselves. We call this phenomenon "neuroplasticity."

If we want AI models that are always learning, we'll need the equivalent of neuroplasticity for artificial neural networks.

Not saying it will be easy or straightforward. There's still a lot we don't know!

	▲	nemomarx 18 minutes ago \| parent [-]
		How would you keep controls - safety restrictions - Ip restrictions etc with that, though? the companies selling models right now probably want to keep those fairly tight.

▲

com2kid 2 hours ago | parent | prev | next [-]

If done right, one step closer to actual AGI.

That is the end goal after all, but all the potential VCs seem to forget that almost every conceivable outcome of real AGI involves the current economic system falling to pieces.

Which is sorta weird. It is like if VCs in Old Regime france started funding the revolution.

▲

CorrectHorseBat an hour ago | parent [-]

Yes the planet got destroyed. But for a beautiful moment in time we created a lot of value for shareholders.

And for your comparison, they did fund the American revolution which on its turn was one of the sparks for the French revolution (or was that exactly the point you were making?)

	▲	com2kid 20 minutes ago \| parent [-]
		The funding of the American revolution is a fun topic but most people don't know about it so I don't bother dropping references to it. :D

▲

0xdeadbeefbabe 35 minutes ago | parent | prev | next [-]

How about we just put them to bed once in a while?

▲

bdj108 an hour ago | parent | prev [-]

it is interesting