Remix.run Logo
topspin 4 days ago

> "After completing the T steps, the H-module incorporates the sub-computation’s outcome (the final state L) and performs its own update. This H update establishes a fresh context for the L-module, essentially “restarting” its computational path and initiating a new convergence phase toward a different local equilibrium."

So they let the low-level RNN bottom out, evaluate the output in the high level module, and generate a new context for the low-level RNN. Rinse, repeat. The low-level RNNs are iterating backpropagation while the high-level is periodically kicking the low-level RNNs to get better outputs. Loops within loops. Composition.

Another interesting part:

> "Neuroscientific evidence shows that these cognitive modes share overlapping neural circuits, particularly within regions such as the prefrontal cortex and the default mode network. This indicates that the brain dynamically modulates the “runtime” of these circuits according to task complexity and potential rewards.

> Inspired by the above mechanism, we incorporate an adaptive halting strategy into HRM that enables `thinking, fast and slow'"

A scheduler that dynamically balances resources based on the necessary depth of reasoning and the available data.

I love how this paper cites parallels with real brains throughout. I believe AGI will be solved as the primitives we're developing are composed to extreme complexity, utilizing many cooperating, competing, communicating, concurrent, specialized "modules." It is apparent to me that human brain must have this complexity, because it's the only feasible way evolution had to achieve cognition using slow, low power tissue.

username135 4 days ago | parent [-]

As soon I read the hlm/llm split, it immediately reminded me of the human brain.

esafak 3 days ago | parent [-]

Composition is the whole point of deep learning. Deep as in multilayer, multilevel.

dbagr 3 days ago | parent [-]

You need recursion at some point: you can't account for all possible scenarios of combinations, as you would need an infinite number of layers.

crystal_revenge 3 days ago | parent | next [-]

> infinite number of layers

That’s not as impossible as it seems, Gaussian Processes are equivalent to a Neural Network with infinite hidden units, and any multilayer NN can be approximated by one with a single, larger layer of hidden units.

topspin 3 days ago | parent [-]

"a single, larger layer of hidden units"

Does this not mean that the entire model must cycle to operate any given part? Division into concurrent "modules" (the term appearing in this paper,) affords optimizing frequency independently and intentionally.

Also, what certainty is there that everything is best modelled with multilayer NN? Diversity of algorithms, independently optimized, could yield benefits.

Further, can we hope that modularity will create useful points of observability? The inherent serialization that develops between modules could be analyzed, and possibly reveal great insights.

Finally, isn't there a possibility that AGI could be achieved more rapidly by factoring the various processes into discrete modules, as opposed to solving every conceivable difficulty in a monolithic manner, whatever the algorithm?

That's a lot of questions. Seems like identifying possible benefits is easy enough that this approach is worthwhile exploring. We shall see I suppose. At the very least we know the modularization of HRM has a valid precedent: real biological brains.

username135 3 days ago | parent [-]

It would not surprise me if all of these tangential advances in various models and approaches ultimately become part of a larger frame work of modules designed to handle certain tasks - similar to how your medula oblongata operates breathing and heart rate, and your amygdala sorts out memory and hormone production, and your cingulate gyrus helps control motor function, et al.

We have a great example (us), we just need to hone and replicate it.

advael 3 days ago | parent | prev [-]

I mean recurrence is an attempt to allow approximation of recursive processes, no?