Documentation

MDP.Counterexample.A

Counterexample exhibiting `⨆ n, ⨅ 𝒮, EC c 𝒮 n < ⨅ 𝒮, ⨆ n, EC c 𝒮 n` #

  s⋆-----+-----+·····+·····
  |      |     |     |
s₀,₀   s₀,₁  s₀,₂  s₀,ᵢ
  ⋮      |     |     ⋮
  ∞    s₁,₁  s₁,₂    ⋮
         ⋮     |     ⋮
         ∞   s₂,₂    ⋮
               ⋮   sᵢ,ᵢ
               ∞     ⋮
                     ∞

Setup: (instance)

The MDP consists of states s⋆ and sᵢ,ⱼ for all i and j with actions in ℕ.
In the initial state s⋆ has all actions enabled (that is all ℕ).
Every other state only has the action 0 enabled.
There is a transition from s⋆ to all s₀,ᵢ for all i ∈ ℕ with action i.
For all states sᵢ,ⱼ there is a transition to sᵢ₊₁,ⱼ.
Every transition is non-probabilistic (i.e. probability = 1).
The cost of states are either 0 or ⊤.
- s⋆ and sᵢ,ⱼ where i < j has cost 0.
- sᵢ,ⱼ where i ≥ j has cost ⊤.

Now consider the two order of optimization ⨆ n, ⨅ 𝒮, EC c 𝒮 n and ⨅ 𝒮, ⨆ n, EC c 𝒮 n.

In the first case we the scheduler gets to make its choice based on n, and thus can choose a an action where the depth will not reach a state like sᵢ,ᵢ with ⊤ cost. Thus the expected cost for the ⨆⨅ order will be 0.

In the second case we consider first a scheduler and then a depth. That means that we can pick a depth, say i+1, where the action the scheduler picked in s⋆ was i. In this case we will always be able to pick a depth that reaches a state with ⊤ cost. Thus the expected cost for the ⨅⨆ order will be ⊤.

This leads to iSup_iInf_EC_lt_iInf_iSup_EC.

Additionally we can show the same for MDP.lfp_Φ giving us iSup_iInf_EC_lt_lfp_Φ.

inductive MDP.Counterexample.A.State :

init : State
node (i j : ℕ) : State

Instances For

instance MDP.Counterexample.A.instDecidableEqState :

DecidableEq State

Equations

MDP.Counterexample.A.instDecidableEqState = MDP.Counterexample.A.decEqState✝

inductive MDP.Counterexample.A.Step :

State → ℕ → ENNReal → State → Prop

choice {α : ℕ} : Step State.init α 1 (State.node 0 α)
step {i j : ℕ} : Step (State.node i j) 0 1 (State.node (i + 1) j)

Instances For

theorem MDP.Counterexample.A.step_iff (a✝ : State) (a✝¹ : ℕ) (a✝² : ENNReal) (a✝³ : State) :

Step a✝ a✝¹ a✝² a✝³ ↔ a✝ = State.init ∧ a✝² = 1 ∧ a✝³ = State.node 0 a✝¹ ∨ ∃ (i : ℕ) (j : ℕ), a✝ = State.node i j ∧ a✝¹ = 0 ∧ a✝² = 1 ∧ a✝³ = State.node (i + 1) j

noncomputable instance MDP.Counterexample.A.instDecidableStep {c : State} {α : ℕ} {p : ENNReal} {c' : State} :

Decidable (Step c α p c')

Equations

MDP.Counterexample.A.instDecidableStep = Classical.propDecidable (MDP.Counterexample.A.Step c α p c')

@[simp]

theorem MDP.Counterexample.A.init_iff {α : ℕ} {p : ENNReal} {s' : State} :

Step State.init α p s' ↔ p = 1 ∧ s' = State.node 0 α

@[simp]

theorem MDP.Counterexample.A.node_iff {i j α : ℕ} {p : ENNReal} {s' : State} :

Step (State.node i j) α p s' ↔ α = 0 ∧ p = 1 ∧ s' = State.node (i + 1) j

@[simp]

theorem MDP.Counterexample.A.not_to_init {s : State} {α : ℕ} {p : ENNReal} :

¬Step s α p State.init

@[simp]

theorem MDP.Counterexample.A.tsum_p {c : State} {α : ℕ} {c' : State} :

∑' (p : { p : ENNReal // Step c α p c' }), ↑p = ∑' (p : ENNReal), if Step c α p c' then p else 0

noncomputable def MDP.Counterexample.A.M :

Equations

MDP.Counterexample.A.M = MDP.ofRelation MDP.Counterexample.A.Step (@MDP.Counterexample.A.M._proof_10) (@MDP.Counterexample.A.M._proof_11) MDP.Counterexample.A.M._proof_12

Instances For

def MDP.Counterexample.A.M.cost :

Equations

MDP.Counterexample.A.M.cost (MDP.Counterexample.A.State.node i j) = if j ≤ i then ⊤ else 0
MDP.Counterexample.A.M.cost x✝ = 0

Instances For

@[simp]

theorem MDP.Counterexample.A.M.act_eq :

M.act = fun (s : State) => if s = State.init then Set.univ else {0}

@[simp]

theorem MDP.Counterexample.A.𝒮_node {𝒮 : 𝔖[M ]} {i j : ℕ} :

𝒮 {State.node i j} = 0

@[simp]

theorem MDP.Counterexample.A.succs_univ_init :

M.succs_univ State.init = {x : State | ∃ (α : ℕ), State.node 0 α = x}

@[simp]

theorem MDP.Counterexample.A.succs_univ_node {i j : ℕ} :

M.succs_univ (State.node i j) = {State.node (i + 1) j}

theorem MDP.Counterexample.A.EC_node_i_le_j_eq_top {𝒮 : 𝔖[M ]} {j i n : ℕ} (h : j ≤ i) :

EC M.cost 𝒮 n (State.node i j) = if n = 0 then 0 else ⊤

theorem MDP.Counterexample.A.𝒮_isMarkovian {𝒮 : 𝔖[M ]} :

𝒮.IsMarkovian

instance MDP.Counterexample.A.instMarkovianStateNat {𝒮 : 𝔖[M ]} :

@[simp]

theorem MDP.Counterexample.A.EC_step {𝒮 : 𝔖[M ]} {n i j : ℕ} :

EC M.cost 𝒮 (n + 2) (State.node i j) = EC M.cost 𝒮 (n + 1) (State.node (i + 1) j)

@[simp]

theorem MDP.Counterexample.A.EC_node_i_j_n_eq_i_j_add_n {𝒮 : 𝔖[M ]} {n i j : ℕ} :

EC M.cost 𝒮 (n + 1) (State.node i j) = EC M.cost 𝒮 1 (State.node (i + n) j)

@[simp]

theorem MDP.Counterexample.A.EC_init_eq_EC_node {𝒮 : 𝔖[M ]} {n : ℕ} :

EC M.cost 𝒮 (n + 2) State.init = EC M.cost 𝒮 (n + 1) (State.node 0 (𝒮 {State.init }))

@[simp]

theorem MDP.Counterexample.A.iInf_iSup_EC_eq_0 :

⨅ (𝒮 : 𝔖[M ]), ⨆ (n : ℕ), EC M.cost 𝒮 n State.init = ⊤

@[simp]

theorem MDP.Counterexample.A.iSup_iInf_EC_eq_top :

⨆ (n : ℕ), ⨅ (𝒮 : 𝔖[M ]), EC M.cost 𝒮 n State.init = 0

@[simp]

theorem MDP.Counterexample.A.iInf_iSup_ECℒ_eq_0 :

⨅ (ℒ : 𝔏[M ]), ⨆ (n : ℕ), EC M.cost (↑ℒ) n State.init = ⊤

@[simp]

theorem MDP.Counterexample.A.iSup_iInf_ECℒ_eq_top :

⨆ (n : ℕ), ⨅ (ℒ : 𝔏[M ]), EC M.cost (↑ℒ) n State.init = 0

theorem MDP.Counterexample.A.lfp_Φ_node_eq_add {i α j : ℕ} :

lfp_Φ M.cost (State.node i α) = lfp_Φ M.cost (State.node (i + j) α)

theorem MDP.Counterexample.A.lfp_Φ_node_zero_eq_top {α : ℕ} :

lfp_Φ M.cost (State.node 0 α) = ⊤

theorem MDP.Counterexample.A.lfp_Φ_node_eq_top {α β : ℕ} :

lfp_Φ M.cost (State.node α β) = ⊤

@[simp]

theorem MDP.Counterexample.A.lfp_Φ_eq_top :

lfp_Φ M.cost State.init = ⊤

theorem MDP.Counterexample.A.iSup_iInf_EC_lt_iInf_iSup_EC :

⨆ (n : ℕ), ⨅ (𝒮 : 𝔖[M ]), EC M.cost 𝒮 n State.init < ⨅ (𝒮 : 𝔖[M ]), ⨆ (n : ℕ), EC M.cost 𝒮 n State.init

theorem MDP.Counterexample.A.iSup_iInf_ECℒ_lt_iInf_iSup_ECℒ :

⨆ (n : ℕ), ⨅ (ℒ : 𝔏[M ]), EC M.cost (↑ℒ) n State.init < ⨅ (ℒ : 𝔏[M ]), ⨆ (n : ℕ), EC M.cost (↑ℒ) n State.init

theorem MDP.Counterexample.A.iSup_iInf_EC_lt_lfp_Φ :

⨆ (n : ℕ), ⨅ (𝒮 : 𝔖[M ]), EC M.cost 𝒮 n State.init < lfp_Φ M.cost State.init