Remix clone Hacker News

new | show | ask | jobs Github

	▲	porridgeraisin 6 hours ago
		The bellman equations (exactly as written above) are not found in ML libraries. This is because they work assuming you know a model of the data. Most real world RL is model-free RL. Or, like in LLMs, "model is known but too big to practically use" RL. Apart from the resources you use (good ones in other comments already), try to get the initial mental model of the whole field right, that is important since everything you read can then fit in the right place of that mental model. I will try to give one below. - the absolute core raison d'etre of RL as a separate field: the quality of data you train on only improves as your algorithm improves. As opposed to other ML where you have all your data beforehand. - first basic bellman equation solving (this is code wise just solving a system of linear equations) - an algo you will come across called policy iteration (code wise, a bunch of for loops..) - here you will be able to see how different parts of the algo become impossible in different setups, and what approximations can be done for each of them (and this is where the first neural network - called "function approximator" in RL literature - comes into play). Here you can recognise approximate versions of the bellman equation. - here you learn DDPG, SAC algos. Crucial. Called "actor critic" in parlance. - you also notice problems of this approach that arise because a) you don't have much high quality data and b) learning recursivelt with neural networks is very unstable, this motivates stuff like PPO. - then you can take a step back, look at deep RL, and re-cast everything in normal ML terms. For example, techniques like TD learning (the term you would have used so far) can be re-cast as simply "data augmentation", which you do in ML all the time. - at this point you should get in the weeds of actually engineering at scale real RL algos. Stuff like atari benchmarks. You will find that in reality, the algos as learnt are more or less a template and you need lots of problems specific detailing to actually make it work. And you will also learn engineering tricks that are crucial. This is mostly computer science stuff (increasing throughout on gpu etc - but correctly! without changing the model assumptions) - learn goal conditioned RL, imitation learning, some model based RL like alphazero/dreamer after all of the above. You will be able to easily understand it in the overall context at this point. First two are used in robotics quite a bit. You can run a few small robotics benchmarks at this point. - learn stuff like HRL, offline RL as extras since they are not that practically relevant yet.