Markov Decision Processes (MDP) #
This module concerns itself with countably infinite branching MDPs.
Main definitions #
MDP: Core definition of MDPs.MDP.FiniteBranching: A class of MDPs where enabled actions and successors of every state is finite.MDP.Path: Finite paths of MDPs.MDP.Scheduler: Schedulers resolve nondeterminism. Also known as Strategy, Policy, Adversary, etc..MDP.EC: Expected total cost.MDP.iInf_iSup_EC_comm_lfp_Φš_all_eq: Relation of different formalization of optimal expected cost equivalent for finitely branching MDPs.MDP.iSup_iSup_EC_eq_lfp_Φš: Fixed point characterization of maximal expected cost.
theorem
MDP.iInf_iSup_EC_comm_lfp_Φš_all_eq
{State : Type u_1}
{Act : Type u_2}
{M : MDP State Act}
[DecidableEq State]
{c : M.Costs}
[M.FiniteBranching]
:
For finitely branching MDP, the optimal expected cost is equivalent under all of the following
definitions:
⨠n, ⨠š® : š[M], M.EC c š® n: Consider a potentially history dependent scheduler under bounded length, and then push the length to the limit.⨠n, ⨠ā : š[M], M.EC c ā n: Like the previous but limit to history independent (Markovian) schedulers.⨠š® : š[M], ⨠n, M.EC c š® n: Push the lengths of paths to the limit, and then consider a potentially history dependent scheduler.⨠ā : š[M], ⨠n, M.EC c ā n: Like the previous but limit to history independent (Markovian) schedulers.lfp (Φ š c): The least fixed point of the Bellman operatorM.dΦ.