Counterexample exhibiting ⨆ n, ⨅ 𝒮, EC c 𝒮 n < ⨅ 𝒮, ⨆ n, EC c 𝒮 n #
s⋆-----+-----+·····+·····
| | | |
s₀,₀ s₀,₁ s₀,₂ s₀,ᵢ
⋮ | | ⋮
∞ s₁,₁ s₁,₂ ⋮
⋮ | ⋮
∞ s₂,₂ ⋮
⋮ sᵢ,ᵢ
∞ ⋮
∞
Setup: (instance)
- The MDP consists of states
s⋆andsᵢ,ⱼfor alliandjwith actions inℕ. - In the initial state
s⋆has all actions enabled (that is allℕ). - Every other state only has the action
0enabled. - There is a transition from
s⋆to alls₀,ᵢfor alli ∈ ℕwith actioni. - For all states
sᵢ,ⱼthere is a transition tosᵢ₊₁,ⱼ. - Every transition is non-probabilistic (i.e. probability = 1).
- The cost of states are either
0or⊤.s⋆andsᵢ,ⱼwherei < jhas cost0.sᵢ,ⱼwherei ≥ jhas cost⊤.
Now consider the two order of optimization ⨆ n, ⨅ 𝒮, EC c 𝒮 n and ⨅ 𝒮, ⨆ n, EC c 𝒮 n.
In the first case we the scheduler gets to make its choice based on n, and thus can choose a an
action where the depth will not reach a state like sᵢ,ᵢ with ⊤ cost. Thus the expected cost for
the ⨆⨅ order will be 0.
In the second case we consider first a scheduler and then a depth. That means that we can pick a
depth, say i+1, where the action the scheduler picked in s⋆ was i. In this case we will
always be able to pick a depth that reaches a state with ⊤ cost. Thus the expected cost for
the ⨅⨆ order will be ⊤.
This leads to iSup_iInf_EC_lt_iInf_iSup_EC.
Additionally we can show the same for MDP.lfp_Φ giving us iSup_iInf_EC_lt_lfp_Φ.
- choice {α : ℕ} : Step State.init α 1 (State.node 0 α)
- step {i j : ℕ} : Step (State.node i j) 0 1 (State.node (i + 1) j)
Instances For
noncomputable instance
MDP.Counterexample.A.instDecidableStep
{c : State}
{α : ℕ}
{p : ENNReal}
{c' : State}
:
Equations
@[simp]
@[simp]
@[simp]
Equations
Instances For
Equations
- MDP.Counterexample.A.M.cost (MDP.Counterexample.A.State.node i j) = if j ≤ i then ⊤ else 0
- MDP.Counterexample.A.M.cost x✝ = 0
Instances For
@[simp]
@[simp]
@[simp]
@[simp]
@[simp]
@[simp]