Markov Decision Processes (MDP) #

This module concerns itself with countably infinite branching MDPs.

Main definitions #

MDP: Core definition of MDPs.
MDP.FiniteBranching: A class of MDPs where enabled actions and successors of every state is finite.
MDP.Path: Finite paths of MDPs.
MDP.Scheduler: Schedulers resolve nondeterminism. Also known as Strategy, Policy, Adversary, etc..
MDP.Φ: The Bellman operator.
MDP.EC: Expected total cost.
MDP.iInf_iSup_EC_comm_lfp_Φ_all_eq: Relation of different formalization of optimal expected cost equivalent for finitely branching MDPs.
MDP.iSup_iSup_EC_eq_lfp_Ψ: Fixed point characterization of maximal expected cost.

theorem MDP.iInf_iSup_EC_comm_lfp_Φ_all_eq {State : Type u_1} {Act : Type u_2} {M : MDP State Act} [DecidableEq State] {c : M.Costs} [M.FiniteBranching] :

let S := {⨆ (n : ℕ), ⨅ (𝒮 : 𝔖[M]), EC c 𝒮 n, ⨆ (n : ℕ), ⨅ (ℒ : 𝔏[M]), EC c (↑ℒ) n, ⨅ (𝒮 : 𝔖[M]), ⨆ (n : ℕ), EC c 𝒮 n, ⨅ (ℒ : 𝔏[M]), ⨆ (n : ℕ), EC c (↑ℒ) n, lfp_Φ c}; ∀ (v₁ v₂ : ↑S), v₁ = v₂

For finitely branching MDP, the optimal expected cost is equivalent under all of the following definitions:

⨆ n, ⨅ 𝒮 : 𝔖[M], EC c 𝒮 n: Consider a potentially history dependent scheduler under bounded length, and then push the length to the limit.
⨆ n, ⨅ ℒ : 𝔏[M], EC c ℒ n: Like the previous but limit to history independent (Markovian) schedulers.
⨅ 𝒮 : 𝔖[M], ⨆ n, EC c 𝒮 n: Push the lengths of paths to the limit, and then consider a potentially history dependent scheduler.
⨅ ℒ : 𝔏[M], ⨆ n, EC c ℒ n: Like the previous but limit to history independent (Markovian) schedulers.
M.lfp_Φ c: The least fixed point of the Bellman operator M.Φ.