Markov Decision Processes (MDP) #
This module concerns itself with countably infinite branching MDPs.
Main definitions #
MDP: Core definition of MDPs.MDP.FiniteBranching: A class of MDPs where enabled actions and successors of every state is finite.MDP.Path: Finite paths of MDPs.MDP.Scheduler: Schedulers resolve nondeterminism. Also known as Strategy, Policy, Adversary, etc..MDP.Φ: The Bellman operator.MDP.EC: Expected total cost.MDP.iInf_iSup_EC_comm_lfp_Φ_all_eq: Relation of different formalization of optimal expected cost equivalent for finitely branching MDPs.MDP.iSup_iSup_EC_eq_lfp_Ψ: Fixed point characterization of maximal expected cost.
theorem
MDP.iInf_iSup_EC_comm_lfp_Φ_all_eq
{State : Type u_1}
{Act : Type u_2}
{M : MDP State Act}
[DecidableEq State]
{c : M.Costs}
[M.FiniteBranching]
:
For finitely branching MDP, the optimal expected cost is equivalent under all of the following
definitions:
⨆ n, ⨅ 𝒮 : 𝔖[M], EC c 𝒮 n: Consider a potentially history dependent scheduler under bounded length, and then push the length to the limit.⨆ n, ⨅ ℒ : 𝔏[M], EC c ℒ n: Like the previous but limit to history independent (Markovian) schedulers.⨅ 𝒮 : 𝔖[M], ⨆ n, EC c 𝒮 n: Push the lengths of paths to the limit, and then consider a potentially history dependent scheduler.⨅ ℒ : 𝔏[M], ⨆ n, EC c ℒ n: Like the previous but limit to history independent (Markovian) schedulers.M.lfp_Φ c: The least fixed point of the Bellman operatorM.Φ.