Remix.run Logo
gyrovagueGeist 10 hours ago

I've always found curriculum learning incredibly hard to tune and calibrate reliably (even more so than many other RL approaches!).

Reward scales and horizon lengths may vary across tasks with different difficulty, effectively exploring policy space (keeping multimodal strategy distributions for exploration before overfitting on small problems), and catastrophic forgetting when mixing curriculum levels or when introducing them too late.

Does any reader/or the author have good heuristics for these? Or is it still so problem dependent that hyper parameter search for finding something that works in spite of these challenges is still the go to?

kywch 6 hours ago | parent [-]

I think Go-Explore (https://arxiv.org/abs/1901.10995) is promising. It'll provide automatic scaffolding and prevent catastrophic forgetting.

If one can frame the problem into a competition, then self-play has been shown to work repeatedly.