Documentation

MDP

Markov Decision Processes (MDP) #

This module concerns itself with countably infinite branching MDPs.

Main definitions #

theorem MDP.iInf_iSup_EC_comm_lfp_Φ_all_eq {State : Type u_1} {Act : Type u_2} {M : MDP State Act} [DecidableEq State] {c : M.Costs} [M.FiniteBranching] :
let S := {⨆ (n : ), ⨅ (𝒮 : 𝔖[M]), EC c 𝒮 n, ⨆ (n : ), ⨅ ( : 𝔏[M]), EC c (↑) n, ⨅ (𝒮 : 𝔖[M]), ⨆ (n : ), EC c 𝒮 n, ⨅ ( : 𝔏[M]), ⨆ (n : ), EC c (↑) n, lfp_Φ c}; ∀ (v₁ v₂ : S), v₁ = v₂

For finitely branching MDP, the optimal expected cost is equivalent under all of the following definitions:

  • ⨆ n, ⨅ 𝒮 : 𝔖[M], EC c 𝒮 n: Consider a potentially history dependent scheduler under bounded length, and then push the length to the limit.
  • ⨆ n, ⨅ ℒ : 𝔏[M], EC c ℒ n: Like the previous but limit to history independent (Markovian) schedulers.
  • ⨅ 𝒮 : 𝔖[M], ⨆ n, EC c 𝒮 n: Push the lengths of paths to the limit, and then consider a potentially history dependent scheduler.
  • ⨅ ℒ : 𝔏[M], ⨆ n, EC c ℒ n: Like the previous but limit to history independent (Markovian) schedulers.
  • M.lfp_Φ c: The least fixed point of the Bellman operator M.Φ.