Counterexample exhibiting ⨅ 𝒮, ⨆ n, EC c 𝒮 n < ⨅ ℒ, ⨆ n, EC c ℒ n
#
$0 $1 $0
┌─►s₁─────────►s₂─────────►s₃─┐
𝓅(α) │ │ 1-𝓅(α) 1 ▲ │ 1
└──┘ └──┘
Setup: (instance)
- The MDP consists three states
s₁
,s₂
, and,s₃
and actionsℕ
. - State
s₁
has all actions enabled (i.e. allℕ
) whiles₂
ands₃
only has0
enabled. - The MDP is parameterized by a probability function
𝓅 : ℕ → ℝ≥0∞
where0 < 𝓅 < 1
that dictates the probability ins₁
such thatP(s₁, i, s₁) = 𝓅(i)
andP(s₁, i, s₂) = 1 - 𝓅(i)
for alli ∈ ℕ
. - The remaining probabilities are
P(s₂, 0, s₃) = 1
andP(s₃, 0, s₃) = 1
, with all others being0
. - State
s₁
ands₃
has cost0
ands₂
has cost1
.
Note that if we were to ever leave s₁
we would get a incur a cost of 1
, and thus in order to get
minimal cost (0
) one would have to stay in s₁
forever.
Now, there's no way to pick 0 < 𝓅 < 1
such that the outgoing probability 1 - 𝓅(α)
is zero, we
must instead try to minimize it.
For a fixed α
the probability of staying n
times 𝓅(α)ⁿ
which in the limit is 0
, and thus
the probability of leaving is 1
.
As a Markovian scheduler will always pick the same action in the same state, we find ourself in the
above scenario, and will thus have an expected cost of 1
for any Markovian scheduler, regardless
of choice of 𝓅
.
The task now is to pick 𝓅
such to exploit the history of a scheduled path to beat the Markovian
scheduler.
By carefully picking 𝓅(n) = (2 ^ (2 ^ n)⁻¹)⁻¹
and using the scheduler that picks an action based
on the length of the scheduled path, such that, 𝒮(π) = ‖π‖
, we find that in the limit the
probability of staying (and of leaving) is 1/2
, and thus the expected cost is 1/2
.
This leads us to conclude iInf_iSup_EC_lt_iInf_iSup_ECℒ
.
Equations
Equations
Instances For
Equations
Instances For
Equations
Instances For
Equations
Instances For
Equations
- MDP.Counterexample.C.p = { toFun := fun (n : ℕ) => (2 ^ (2 ^ n)⁻¹)⁻¹, property := MDP.Counterexample.C.p._proof_27 }