Counterexample exhibiting ⨅ 𝒮, ⨆ n, EC c 𝒮 n < ⨅ ℒ, ⨆ n, EC c ℒ n #
$0 $1 $0
┌─►s₁─────────►s₂─────────►s₃─┐
𝓅(α) │ │ 1-𝓅(α) 1 ▲ │ 1
└──┘ └──┘
Setup: (instance)
- The MDP consists three states
s₁,s₂, and,s₃and actionsℕ. - State
s₁has all actions enabled (i.e. allℕ) whiles₂ands₃only has0enabled. - The MDP is parameterized by a probability function
𝓅 : ℕ → ℝ≥0∞where0 < 𝓅 < 1that dictates the probability ins₁such thatP(s₁, i, s₁) = 𝓅(i)andP(s₁, i, s₂) = 1 - 𝓅(i)for alli ∈ ℕ. - The remaining probabilities are
P(s₂, 0, s₃) = 1andP(s₃, 0, s₃) = 1, with all others being0. - State
s₁ands₃has cost0ands₂has cost1.
Note that if we were to ever leave s₁ we would get a incur a cost of 1, and thus in order to get
minimal cost (0) one would have to stay in s₁ forever.
Now, there's no way to pick 0 < 𝓅 < 1 such that the outgoing probability 1 - 𝓅(α) is zero, we
must instead try to minimize it.
For a fixed α the probability of staying n times 𝓅(α)ⁿ which in the limit is 0, and thus
the probability of leaving is 1.
As a Markovian scheduler will always pick the same action in the same state, we find ourself in the
above scenario, and will thus have an expected cost of 1 for any Markovian scheduler, regardless
of choice of 𝓅.
The task now is to pick 𝓅 such to exploit the history of a scheduled path to beat the Markovian
scheduler.
By carefully picking 𝓅(n) = (2 ^ (2 ^ n)⁻¹)⁻¹ and using the scheduler that picks an action based
on the length of the scheduled path, such that, 𝒮(π) = ‖π‖, we find that in the limit the
probability of staying (and of leaving) is 1/2, and thus the expected cost is 1/2.
This leads us to conclude iInf_iSup_EC_lt_iInf_iSup_ECℒ.
Equations
Equations
Instances For
Equations
Instances For
Equations
Instances For
Equations
Instances For
Equations
- MDP.Counterexample.C.p = { toFun := fun (n : ℕ) => (2 ^ (2 ^ n)⁻¹)⁻¹, property := MDP.Counterexample.C.p._proof_27 }