test

78ffb3e9 · xiaowei_xing · 8c4ac8f8 · 78ffb3e9
隐藏空白更改
内联并排

Showing with 5 addition and 1 deletion

docs/8&9.md docs/8&9.md +5 -1

未找到文件。
--- a/docs/8&9.md
+++ b/docs/8&9.md
@@ -375,4 +375,8 @@ $$
 \hat{A}_t=(G_t^{(i)}-b(s_t))。
 $$

-第二，为什么我们可以这样做？结果表明，用这种方式减去一个基准并不会在梯度计算中引入任何偏差。$\mathbb{E}_{\tau}[b(s_t)\nabla_{\theta}\log \pi_{\theta}(a_t|s_t)]$ 为 $0$，因此不会影响梯度更新。
\ No newline at end of file
+第二，为什么我们可以这样做？结果表明，用这种方式减去一个基准并不会在梯度计算中引入任何偏差。$\mathbb{E}_{\tau}[b(s_t)\nabla_{\theta}\log \pi_{\theta}(a_t|s_t)]$ 为 $0$，因此不会影响梯度更新。
+
+$$
+\mathbb{E}_ {\tau\sim\pi_{\theta}}[b(s_t)\nabla_{\theta}\log \pi_{\theta}(a_t|s_t)]
+$$
\ No newline at end of file