Remix clone Hacker News

new | show | ask | jobs Github

	▲	tomrod 4 days ago
		The core conditions for Bellman policy equivalence are pretty straightforward and handled in Stokey/Lucas, Recursive Dynamics: [1] Discounting: The discount factor β ∈ (0,1) is crucial. It ensures convergence of the value function and prevents “infinite accumulation” problems. [2] Compactness of state/action sets: The feasible action correspondence Γ(x) is nonempty, compact-valued, and upper hemicontinuous in the state x. The state space X is compact (or at least the feasible set is bounded enough to avoid unbounded payoffs). [3] Continuity: The return (or reward) function u(x,a) is continuous in (x,a). The transition law f(x,a) is continuous in (x,a). [4] Bounded rewards: u(x,a) is bounded (often assumed continuous and bounded). This keeps the Bellman operator well-defined and ensures contraction mapping arguments go through. [5] Contraction mapping property: With discounting and bounded payoffs, the Bellman operator is a contraction on the space of bounded continuous functions. This guarantees existence and uniqueness of the value function V. [6] Measurable selection for policies: Under the above continuity and compactness assumptions, the maximum in the Bellman equation is attained, and there exists a measurable policy function g(x) that selects optimal actions.