Markov Decision Processes (MDP) #
This module concerns itself with countably infinite branching MDPs.
Main definitions #
MDP
: Core definition of MDPs.MDP.FiniteBranching
: A class of MDPs where enabled actions and successors of every state is finite.MDP.Path
: Finite paths of MDPs.MDP.Scheduler
: Schedulers resolve nondeterminism. Also known as Strategy, Policy, Adversary, etc..MDP.Φ
: The Bellman operator.MDP.EC
: Expected total cost.MDP.iInf_iSup_EC_comm_lfp_Φ_all_eq
: Relation of different formalization of optimal expected cost equivalent for finitely branching MDPs.MDP.iSup_iSup_EC_eq_lfp_Ψ
: Fixed point characterization of maximal expected cost.
theorem
MDP.iInf_iSup_EC_comm_lfp_Φ_all_eq
{State : Type u_1}
{Act : Type u_2}
{M : MDP State Act}
[DecidableEq State]
{c : M.Costs}
[M.FiniteBranching]
:
For finitely branching MDP
, the optimal expected cost is equivalent under all of the following
definitions:
⨆ n, ⨅ 𝒮 : 𝔖[M], EC c 𝒮 n
: Consider a potentially history dependent scheduler under bounded length, and then push the length to the limit.⨆ n, ⨅ ℒ : 𝔏[M], EC c ℒ n
: Like the previous but limit to history independent (Markovian
) schedulers.⨅ 𝒮 : 𝔖[M], ⨆ n, EC c 𝒮 n
: Push the lengths of paths to the limit, and then consider a potentially history dependent scheduler.⨅ ℒ : 𝔏[M], ⨆ n, EC c ℒ n
: Like the previous but limit to history independent (Markovian
) schedulers.M.lfp_Φ c
: The least fixed point of the Bellman operatorM.Φ
.