Remix.run Logo
tomrod 4 days ago

The core conditions for Bellman policy equivalence are pretty straightforward and handled in Stokey/Lucas, Recursive Dynamics:

[1] Discounting: The discount factor β ∈ (0,1) is crucial. It ensures convergence of the value function and prevents “infinite accumulation” problems.

[2] Compactness of state/action sets: The feasible action correspondence Γ(x) is nonempty, compact-valued, and upper hemicontinuous in the state x. The state space X is compact (or at least the feasible set is bounded enough to avoid unbounded payoffs).

[3] Continuity: The return (or reward) function u(x,a) is continuous in (x,a). The transition law f(x,a) is continuous in (x,a).

[4] Bounded rewards: u(x,a) is bounded (often assumed continuous and bounded). This keeps the Bellman operator well-defined and ensures contraction mapping arguments go through.

[5] Contraction mapping property: With discounting and bounded payoffs, the Bellman operator is a contraction on the space of bounded continuous functions. This guarantees existence and uniqueness of the value function V.

[6] Measurable selection for policies: Under the above continuity and compactness assumptions, the maximum in the Bellman equation is attained, and there exists a measurable policy function g(x) that selects optimal actions.