Has anyone tried implementing something like System M's meta-control switching in practice? Curious how you'd handle the reward signal for deciding when to switch between observation and active exploration without it collapsing into one mode.

▲

robot-wrangler 10 hours ago | parent | next [-]

> Curious how you'd handle the reward signal for deciding when to switch between observation and active exploration without it collapsing into one mode.

If you like biomimetic approaches to computer science, there's evidence that we want something besides neural networks. Whether we call such secondary systems emotions, hormones, or whatnot doesn't really matter much if the dynamics are useful. It seems at least possible that studying alignment-related topics is going to get us closer than any perspective that's purely focused on learning. Coincidentally quanta is on some related topics today: https://www.quantamagazine.org/once-thought-to-support-neuro...

▲

fallous 9 hours ago | parent | next [-]

The question is does this eventually lead us back to genetic programming and can we adequately avoid the problems of over-fitting to specific hardware that tended to crop up in the past?

▲

t-writescode 9 hours ago | parent | prev [-]

Or possibly “in addition to”, yeah. I think this is where it needs to go. We can’t keep training HUGE neural networks every 3 months and throw out all the work we did and the billions of dollars in gear and training just to use another model a few months.

That loops is unsustainable. Active learning needs to be discovered / created.

▲

exe34 5 hours ago | parent [-]

if that's the arguement for active learning, wouldn't it also apply in that case? it learns something and 5 minutes later my old prompts are useless.

	▲	t-writescode 3 hours ago \| parent [-]
		That depends on the goals of the prompts you use with the LLM: * as a glorified natural language processor (like I have done), you'll probably be fine, maybe * as someone to communicate with, you'll also probably be fine * as a very basic prompt-follower? Like, natural language processing-level of prompt "find me the important words", etc. Probably fine, or close enough. * as a robust prompt system with complicated logic each prompt? Yes, it will begin to fail catastrophically, especially if you're wanting to be repeatable. I'm not sure that the general public is that interested in perfectly repeatable work, though. I think they're looking for consistent and improving work.

▲

claud_ia 3 hours ago | parent | prev [-]

[dead]