Documentation

MDP.Counterexample.C

Counterexample exhibiting ⨅ 𝒮, ⨆ n, EC c 𝒮 n < ⨅ ℒ, ⨆ n, EC c ℒ n #

        $0          $1          $0
     ┌─►s₁─────────►s₂─────────►s₃─┐
𝓅(α) │  │   1-𝓅(α)        1     ▲  │ 1
     └──┘                       └──┘

Setup: (instance)

Note that if we were to ever leave s₁ we would get a incur a cost of 1, and thus in order to get minimal cost (0) one would have to stay in s₁ forever.

Now, there's no way to pick 0 < 𝓅 < 1 such that the outgoing probability 1 - 𝓅(α) is zero, we must instead try to minimize it.

For a fixed α the probability of staying n times 𝓅(α)ⁿ which in the limit is 0, and thus the probability of leaving is 1.

As a Markovian scheduler will always pick the same action in the same state, we find ourself in the above scenario, and will thus have an expected cost of 1 for any Markovian scheduler, regardless of choice of 𝓅.

The task now is to pick 𝓅 such to exploit the history of a scheduled path to beat the Markovian scheduler.

By carefully picking 𝓅(n) = (2 ^ (2 ^ n)⁻¹)⁻¹ and using the scheduler that picks an action based on the length of the scheduled path, such that, 𝒮(π) = ‖π‖, we find that in the limit the probability of staying (and of leaving) is 1/2, and thus the expected cost is 1/2.

This leads us to conclude iInf_iSup_EC_lt_iInf_iSup_ECℒ.

theorem List.take_append_cons_drop {α : Type u_1} {l : List α} {s : α} {i : } (hi : i < l.length) (h : l[i] = s) :
take i l ++ s :: drop (i + 1) l = l
Instances For
    Instances For
      @[simp]
      theorem MDP.Counterexample.C.P.zero_lt (𝓅 : P) (α : ) :
      0 < 𝓅 α
      @[simp]
      theorem MDP.Counterexample.C.P.lt_one (𝓅 : P) (α : ) :
      𝓅 α < 1
      @[simp]
      theorem MDP.Counterexample.C.P.ne_zero (𝓅 : P) (α : ) :
      ¬𝓅 α = 0
      @[simp]
      theorem MDP.Counterexample.C.P.ne_one (𝓅 : P) (α : ) :
      ¬𝓅 α = 1
      @[simp]
      theorem MDP.Counterexample.C.P.le_one (𝓅 : P) (α : ) :
      𝓅 α 1
      @[simp]
      theorem MDP.Counterexample.C.P.one_sub_ne_zero (𝓅 : P) (α : ) :
      ¬1 - 𝓅 α = 0
      @[simp]
      theorem MDP.Counterexample.C.P.add_one_sub (𝓅 : P) (α : ) :
      𝓅 α + (1 - 𝓅 α) = 1
      @[simp]
      theorem MDP.Counterexample.C.P.ne_top (𝓅 : P) (α : ) :
      ¬𝓅 α =
      inductive MDP.Counterexample.C.Step (𝓅 : P) :
      StateENNRealStateProp
      Instances For
        theorem MDP.Counterexample.C.step_iff (𝓅 : P) (a✝ : State) (a✝¹ : ) (a✝² : ENNReal) (a✝³ : State) :
        Step 𝓅 a✝ a✝¹ a✝² a✝³ a✝ = State.s₁ a✝² = 𝓅 a✝¹ a✝³ = State.s₁ a✝ = State.s₁ a✝² = 1 - 𝓅 a✝¹ a✝³ = State.s₂ a✝ = State.s₂ a✝¹ = 0 a✝² = 1 a✝³ = State.s₃ a✝ = State.s₃ a✝¹ = 0 a✝² = 1 a✝³ = State.s₃
        noncomputable instance MDP.Counterexample.C.instDecidableStep (𝓅 : P) {c : State} {α : } {p : ENNReal} {c' : State} :
        Decidable (Step 𝓅 c α p c')
        Equations
        @[simp]
        theorem MDP.Counterexample.C.s₁_iff (𝓅 : P) {α : } {p : ENNReal} {s' : State} :
        Step 𝓅 State.s₁ α p s' s' = State.s₁ p = 𝓅 α s' = State.s₂ p = 1 - 𝓅 α
        @[simp]
        theorem MDP.Counterexample.C.iff_s₁ (𝓅 : P) {s : State} {α : } {p : ENNReal} :
        Step 𝓅 s α p State.s₁ s = State.s₁ p = 𝓅 α
        @[simp]
        theorem MDP.Counterexample.C.s₂_iff (𝓅 : P) {α : } {p : ENNReal} {s' : State} :
        Step 𝓅 State.s₂ α p s' α = 0 p = 1 s' = State.s₃
        @[simp]
        theorem MDP.Counterexample.C.iff_s₂ (𝓅 : P) {s : State} {α : } {p : ENNReal} :
        Step 𝓅 s α p State.s₂ s = State.s₁ p = 1 - 𝓅 α
        @[simp]
        theorem MDP.Counterexample.C.s₃_iff (𝓅 : P) {α : } {p : ENNReal} {s' : State} :
        Step 𝓅 State.s₃ α p s' α = 0 p = 1 s' = State.s₃
        @[simp]
        theorem MDP.Counterexample.C.tsum_p (𝓅 : P) {c : State} {α : } {c' : State} :
        ∑' (p : { p : ENNReal // Step 𝓅 c α p c' }), p = ∑' (p : ENNReal), if Step 𝓅 c α p c' then p else 0
        noncomputable def MDP.Counterexample.C.M (𝓅 : P) :
        Equations
        Instances For
          @[simp]
          theorem MDP.Counterexample.C.M.act_eq (𝓅 : P) :
          (M 𝓅).act = fun (s : State) => if s = State.s₁ then Set.univ else {0}
          @[simp]
          theorem MDP.Counterexample.C.𝒮_s₂ (𝓅 : P) {𝒮 : 𝔖[M 𝓅]} :
          @[simp]
          theorem MDP.Counterexample.C.𝒮_s₃ (𝓅 : P) {𝒮 : 𝔖[M 𝓅]} :
          noncomputable def MDP.Counterexample.C.ℒ_a (𝓅 : P) (a : ) :
          𝔏[M 𝓅]
          Equations
          Instances For
            noncomputable def MDP.Counterexample.C.𝒮_len (𝓅 : P) (a : ) :
            𝔖[M 𝓅]

            Picks the action proportional to the length of the scheduled path

            Equations
            • One or more equations did not get rendered due to their size.
            Instances For
              @[reducible, inline]
              noncomputable abbrev MDP.Counterexample.C.𝒮_s₁ {𝓅 : P} (𝒮 : 𝔖[M 𝓅]) :
              Equations
              Instances For
                @[simp]
                theorem MDP.Counterexample.C.EC_succ_s₃ (𝓅 : P) {𝒮 : 𝔖[M 𝓅]} {n : } :
                @[simp]
                theorem MDP.Counterexample.C.EC_succ_s₂ (𝓅 : P) {𝒮 : 𝔖[M 𝓅]} {n : } :
                EC M.cost 𝒮 n State.s₂ = if n = 0 then 0 else 1
                theorem MDP.Counterexample.C.EC_succ_s₁ (𝓅 : P) {𝒮 : 𝔖[M 𝓅]} {n : } :
                EC M.cost 𝒮 (n + 1) State.s₁ = 𝓅 (𝒮_s₁ 𝒮) * EC M.cost (𝒮.specialize State.s₁ State.s₁, ) n State.s₁ + if n = 0 then 0 else 1 - 𝓅 (𝒮_s₁ 𝒮)
                noncomputable def MDP.Counterexample.C.𝒮_x (𝓅 : P) (𝒮 : 𝔖[M 𝓅]) :
                𝔖[M 𝓅]

                Specializes the given scheduler with a chain of n repetitions of [.s₁ ↦ .s₁] s.t. 𝒮[.s₁ ↦ .s₁]^n.

                Equations
                Instances For
                  theorem MDP.Counterexample.C.𝒮_x_add (𝓅 : P) {𝒮 : 𝔖[M 𝓅]} {n m : } :
                  𝒮_x 𝓅 (𝒮_x 𝓅 𝒮 n) m = 𝒮_x 𝓅 𝒮 (n + m)
                  theorem MDP.Counterexample.C.𝒮_x_eq_alt (𝓅 : P) {n : } (𝒮 : 𝔖[M 𝓅]) :
                  𝒮_x 𝓅 𝒮 n = 𝒮_x_alt 𝓅 𝒮 n
                  @[simp]
                  theorem MDP.Counterexample.C.𝒮_x_zero (𝓅 : P) {𝒮 : 𝔖[M 𝓅]} :
                  𝒮_x 𝓅 𝒮 0 = 𝒮
                  theorem MDP.Counterexample.C.iSup_EC_succ_s₁ (𝓅 : P) {𝒮 : 𝔖[M 𝓅]} :
                  ⨆ (n : ), EC M.cost 𝒮 n State.s₁ = (𝓅 (𝒮_s₁ 𝒮) * ⨆ (n : ), EC M.cost (𝒮.specialize State.s₁ State.s₁, ) n State.s₁) + (1 - 𝓅 (𝒮_s₁ 𝒮))
                  theorem MDP.Counterexample.C.iSup_EC_succ_eq_iSup_EC (𝓅 : P) {𝒮 : 𝔖[M 𝓅]} :
                  ⨆ (n : ), EC M.cost 𝒮 (n + 1) State.s₁ = ⨆ (n : ), EC M.cost 𝒮 n State.s₁
                  theorem MDP.Counterexample.C.iSup_EC_eq (𝓅 : P) {𝒮 : 𝔖[M 𝓅]} :
                  ⨆ (n : ), EC M.cost 𝒮 n State.s₁ = ∑' (n : ), (1 - 𝓅 (𝒮_s₁ (𝒮_x 𝓅 𝒮 n))) * iFinset.range n, 𝓅 (𝒮_s₁ (𝒮_x 𝓅 𝒮 i))
                  theorem MDP.Counterexample.C.Path_s₁_prior (𝓅 : P) {i j : } (π : (M 𝓅).Path) (hi : i < π) (h : π[i] = State.s₁) (hij : j i) :
                  @[simp]
                  theorem MDP.Counterexample.C.𝒮_x_𝒮_len_one (𝓅 : P) {n : } :
                  𝒮_x 𝓅 (𝒮_len 𝓅 n) 1 = 𝒮_len 𝓅 (n + 1)
                  @[simp]
                  theorem MDP.Counterexample.C.𝒮_x_𝒮_len (𝓅 : P) {n m : } :
                  𝒮_x 𝓅 (𝒮_len 𝓅 n) m = 𝒮_len 𝓅 (n + m)
                  @[simp]
                  theorem MDP.Counterexample.C.𝒮_s₁_𝒮_len (𝓅 : P) {i : } :
                  𝒮_s₁ (𝒮_len 𝓅 i) = i + 1
                  theorem MDP.Counterexample.C.iSup_EC_𝒮_len (𝓅 : P) {i : } :
                  ⨆ (n : ), EC M.cost (𝒮_len 𝓅 i) n State.s₁ = ∑' (n : ), (1 - 𝓅 (i + n + 1)) * xFinset.range n, 𝓅 (i + x + 1)
                  theorem MDP.Counterexample.C.le_of_s₁_eq_s₁ (𝓅 : P) {i : } (π : (M 𝓅).Path) {hi : i < π} (h : π[i] = State.s₁) {j : } (hj : j i) :
                  theorem MDP.Counterexample.C.ge_of_s₁_eq_s₁ (𝓅 : P) {i j : } (π : (M 𝓅).Path) {hi : i < π} (h : π[i] = State.s₃) (hj : i j) (hj' : j < π) :
                  theorem MDP.Counterexample.C.lt_of_s₂_eq_s₁ (𝓅 : P) {i : } (π : (M 𝓅).Path) {hi : i < π} (h : π[i] = State.s₂) {j : } (hj : j < i) :
                  theorem MDP.Counterexample.C.gt_of_s₂_eq_s₃ (𝓅 : P) {i : } (π : (M 𝓅).Path) {hi : i < π} (h : π[i] = State.s₂) {j : } (hj : i < j) (hj' : j < π) :
                  theorem MDP.Counterexample.C.s₂_mem_of_s₁_s₃_mem (𝓅 : P) (π : (M 𝓅).Path) (hs₁ : State.s₁ π) (hs₃ : State.s₃ π) :
                  theorem MDP.Counterexample.C.Cost_one_of_s₂_mem {𝓅✝ : P} {π : (M 𝓅✝).Path} (hs₂ : State.s₂ π) :
                  theorem MDP.Counterexample.C.EC_𝒮_len' (𝓅 : P) {i n : } :
                  EC M.cost (𝒮_len 𝓅 i) n State.s₁ = if n = 0 then 0 else 1 - ∑' (π : Path[M 𝓅,State.s₁,=n]), if sπ, s = State.s₁ then Path.Prob (𝒮_len 𝓅 i) π else 0
                  theorem MDP.Counterexample.C.tsum_paths_eq_ite_tprod (𝓅 : P) {n i : } :
                  (∑' (π : Path[M 𝓅,State.s₁,=n]), if sπ, s = State.s₁ then Path.Prob (𝒮_len 𝓅 i) π else 0) = if n = 0 then 0 else x : Fin (n - 1), 𝓅 (x + i + 1)
                  @[simp]
                  theorem MDP.Counterexample.C.𝒮_x_ℒ (𝓅 : P) {i : } ( : 𝔏[M 𝓅]) :
                  𝒮_x 𝓅 (↑) i =
                  theorem MDP.Counterexample.C.iSup_ECℒ (𝓅 : P) ( : 𝔏[M 𝓅]) :
                  ⨆ (n : ), EC M.cost (↑) n State.s₁ = 1
                  theorem MDP.Counterexample.C.iSup_iSup_ECℒ (𝓅 : P) :
                  ⨆ ( : 𝔏[M 𝓅]), ⨆ (n : ), EC M.cost (↑) n State.s₁ = 1
                  theorem MDP.Counterexample.C.iInf_iSup_ECℒ (𝓅 : P) :
                  ⨅ ( : 𝔏[M 𝓅]), ⨆ (n : ), EC M.cost (↑) n State.s₁ = 1
                  noncomputable def MDP.Counterexample.C.p :
                  Equations
                  Instances For
                    theorem MDP.Counterexample.C.iInf_iSup_EC_ab :
                    ⨅ (𝒮 : 𝔖[M p]), ⨆ (n : ), EC M.cost 𝒮 n State.s₁ ⨆ (n : ), 1 - x : Fin (n - 1), p (x + 1)
                    theorem MDP.Counterexample.C.prod_p_eq' {n : } :
                    x : Fin n, p (x + 1) = 2 ^ (2 ^ (-n) - 1)
                    theorem MDP.Counterexample.C.iInf_iSup_EC_lt_iInf_iSup_ECℒ :
                    ⨅ (𝒮 : 𝔖[M p]), ⨆ (n : ), EC M.cost 𝒮 n State.s₁ < ⨅ ( : 𝔏[M p]), ⨆ (n : ), EC M.cost (↑) n State.s₁