Counterexample exhibiting ⨆ n, ⨅ 𝒮, EC c 𝒮 n < ⨅ 𝒮, ⨆ n, EC c 𝒮 n
#
s⋆-----+-----+·····+·····
| | | |
s₀,₀ s₀,₁ s₀,₂ s₀,ᵢ
⋮ | | ⋮
∞ s₁,₁ s₁,₂ ⋮
⋮ | ⋮
∞ s₂,₂ ⋮
⋮ sᵢ,ᵢ
∞ ⋮
∞
Setup: (instance)
- The MDP consists of states
s⋆
andsᵢ,ⱼ
for alli
andj
with actions inℕ
. - In the initial state
s⋆
has all actions enabled (that is allℕ
). - Every other state only has the action
0
enabled. - There is a transition from
s⋆
to alls₀,ᵢ
for alli ∈ ℕ
with actioni
. - For all states
sᵢ,ⱼ
there is a transition tosᵢ₊₁,ⱼ
. - Every transition is non-probabilistic (i.e. probability = 1).
- The cost of states are either
0
or⊤
.s⋆
andsᵢ,ⱼ
wherei < j
has cost0
.sᵢ,ⱼ
wherei ≥ j
has cost⊤
.
Now consider the two order of optimization ⨆ n, ⨅ 𝒮, EC c 𝒮 n
and ⨅ 𝒮, ⨆ n, EC c 𝒮 n
.
In the first case we the scheduler gets to make its choice based on n
, and thus can choose a an
action where the depth will not reach a state like sᵢ,ᵢ
with ⊤
cost. Thus the expected cost for
the ⨆⨅
order will be 0
.
In the second case we consider first a scheduler and then a depth. That means that we can pick a
depth, say i+1
, where the action the scheduler picked in s⋆
was i
. In this case we will
always be able to pick a depth that reaches a state with ⊤
cost. Thus the expected cost for
the ⨅⨆
order will be ⊤
.
This leads to iSup_iInf_EC_lt_iInf_iSup_EC
.
Additionally we can show the same for MDP.lfp_Φ
giving us iSup_iInf_EC_lt_lfp_Φ
.
- choice {α : ℕ} : Step State.init α 1 (State.node 0 α)
- step {i j : ℕ} : Step (State.node i j) 0 1 (State.node (i + 1) j)
Instances For
noncomputable instance
MDP.Counterexample.A.instDecidableStep
{c : State}
{α : ℕ}
{p : ENNReal}
{c' : State}
:
Equations
@[simp]
@[simp]
@[simp]
Equations
Instances For
Equations
- MDP.Counterexample.A.M.cost (MDP.Counterexample.A.State.node i j) = if j ≤ i then ⊤ else 0
- MDP.Counterexample.A.M.cost x✝ = 0
Instances For
@[simp]
@[simp]
@[simp]
@[simp]
@[simp]
@[simp]