▲	lukko 9 hours ago
		I've just started to try and learn the basics of RL and the Bellman Equation - are there any good books or resources I should look at? I think this post is beyond my current level. I'm most interested in how the equation can be implemented step by step in an ML library - worked examples would be very helpful. Thank you!
	▲	sardukardboard 7 hours ago \| parent \| next [-]
		I worked thru David Silver’s RL course a while back, it’s got great explanations as he builds up the equations. It’s light on implementation, but the intuitive side really complements more code-heavy examples that lack the “why” behind the equations. https://davidstarsilver.wordpress.com/teaching/
	▲	ActivePattern 9 hours ago \| parent \| prev \| next [-]
		Reinforcement Learning by Sutton & Barto is an excellent introduction by two of the founders of the field. Read here: http://incompleteideas.net/book/the-book-2nd.html
	▲	brandonb 7 hours ago \| parent \| prev \| next [-]
		OpenAI's spinning up in deep RL is free and pretty good: https://spinningup.openai.com/en/latest/ It includes both mathematical formulas and PyTorch code. I found it a bit more practical than the Sutton & Barto book, which is a classic but doesn't cover some of the more modern methods used in deep reinforcement learning.
	▲	srean 8 hours ago \| parent \| prev \| next [-]
		I would recommend that you start with one of the classics (not much of deep RL) https://www.andrew.cmu.edu/course/10-703/textbook/BartoSutto... This will have a gentler learning curve. After this you can move on to more advanced material. The other resource I will recommend is everything by Bertsekas. In this context, his books on dynamic programming and neurodyanamic programming. Happy reading.
	▲	porridgeraisin 6 hours ago \| parent \| prev \| next [-]
		The bellman equations (exactly as written above) are not found in ML libraries. This is because they work assuming you know a model of the data. Most real world RL is model-free RL. Or, like in LLMs, "model is known but too big to practically use" RL. Apart from the resources you use (good ones in other comments already), try to get the initial mental model of the whole field right, that is important since everything you read can then fit in the right place of that mental model. I will try to give one below. - the absolute core raison d'etre of RL as a separate field: the quality of data you train on only improves as your algorithm improves. As opposed to other ML where you have all your data beforehand. - first basic bellman equation solving (this is code wise just solving a system of linear equations) - an algo you will come across called policy iteration (code wise, a bunch of for loops..) - here you will be able to see how different parts of the algo become impossible in different setups, and what approximations can be done for each of them (and this is where the first neural network - called "function approximator" in RL literature - comes into play). Here you can recognise approximate versions of the bellman equation. - here you learn DDPG, SAC algos. Crucial. Called "actor critic" in parlance. - you also notice problems of this approach that arise because a) you don't have much high quality data and b) learning recursivelt with neural networks is very unstable, this motivates stuff like PPO. - then you can take a step back, look at deep RL, and re-cast everything in normal ML terms. For example, techniques like TD learning (the term you would have used so far) can be re-cast as simply "data augmentation", which you do in ML all the time. - at this point you should get in the weeds of actually engineering at scale real RL algos. Stuff like atari benchmarks. You will find that in reality, the algos as learnt are more or less a template and you need lots of problems specific detailing to actually make it work. And you will also learn engineering tricks that are crucial. This is mostly computer science stuff (increasing throughout on gpu etc - but correctly! without changing the model assumptions) - learn goal conditioned RL, imitation learning, some model based RL like alphazero/dreamer after all of the above. You will be able to easily understand it in the overall context at this point. First two are used in robotics quite a bit. You can run a few small robotics benchmarks at this point. - learn stuff like HRL, offline RL as extras since they are not that practically relevant yet.
	▲	swimmingbrain 6 hours ago \| parent \| prev [-]
		[dead]