Documentation

MDP

Markov Decision Processes (MDP) #

This module concerns itself with countably infinite branching MDPs.

Main definitions #

theorem MDP.iInf_iSup_EC_comm_lfp_Ī¦š’Ÿ_all_eq {State : Type u_1} {Act : Type u_2} {M : MDP State Act} [DecidableEq State] {c : M.Costs} [M.FiniteBranching] :
have S := {⨆ (n : ā„•), ⨅ (š’® : š”–[M]), EC c š’® n, ⨆ (n : ā„•), ⨅ (ā„’ : š”[M]), EC c (↑ℒ) n, ⨅ (š’® : š”–[M]), ⨆ (n : ā„•), EC c š’® n, ⨅ (ā„’ : š”[M]), ⨆ (n : ā„•), EC c (↑ℒ) n, OrderHom.lfp (Φ Optimization.Demonic c)}; āˆ€ (v₁ vā‚‚ : ↑S), v₁ = vā‚‚

For finitely branching MDP, the optimal expected cost is equivalent under all of the following definitions:

  • ⨆ n, ⨅ š’® : š”–[M], EC c š’® n: Consider a potentially history dependent scheduler under bounded length, and then push the length to the limit.
  • ⨆ n, ⨅ ā„’ : š”[M], EC c ā„’ n: Like the previous but limit to history independent (Markovian) schedulers.
  • ⨅ š’® : š”–[M], ⨆ n, EC c š’® n: Push the lengths of paths to the limit, and then consider a potentially history dependent scheduler.
  • ⨅ ā„’ : š”[M], ⨆ n, EC c ā„’ n: Like the previous but limit to history independent (Markovian) schedulers.
  • lfp (Φ š’Ÿ c): The least fixed point of the Bellman operator M.dΦ.