提交 a433a0a5 编写于 作者: X xiaowei_xing

test

上级 a5cdfa57
......@@ -413,5 +413,23 @@ L_{\pi}(\pi') = \mathbb{E}_{s\sim d^{\pi},a\sim\pi(\cdot|s)} [\frac{\pi'(a|s)}{\
$$
$$
\epsilon_f^{\pi'} = \mathop{\max}_ {s} [\mathbb{E}_{a\sim\pi'(\cdot|s),s'\sim M(\cdot|s,a)} [R(s,a,s')+\gamma f(s') - f(s)]]。
$$
\ No newline at end of file
\epsilon_f^{\pi'} = \mathop{\max}_ {s} [\mathbb{E}_{a\sim\pi'(\cdot|s),s'\sim M(\cdot|s,a)} [R(s,a,s')+\gamma f(s') - f(s)]],
$$
我们有如下上界
$$
V^{\pi'}-V^{\pi} \leq \frac{1}{1-\gamma}(L_{\pi}(\pi') + \lVert d^{\pi'}-d^{\pi} \rVert_1 \epsilon_f^{\pi'})
\tag{6}
$$
和如下下界
$$
V^{\pi'}-V^{\pi} \geq \frac{1}{1-\gamma}(L_{\pi}(\pi') - \lVert d^{\pi'}-d^{\pi} \rVert_1 \epsilon_f^{\pi'})。
\tag{7}
$$
这一引理的证明可以参见 [4] 以及附录 A.3。
注意,$\lVert d^{\pi'}-d^{\pi} \rVert_1$ 的上界已经由引理 5.2 给出,因此可以代入式(6)和式(7)。
\ No newline at end of file
Markdown is supported
0% .
You are about to add 0 people to the discussion. Proceed with caution.
先完成此消息的编辑!
想要评论请 注册