提交 1cc38dee 编写于 作者: X xiaowei_xing

test

上级 57024805
...@@ -394,5 +394,25 @@ $$ ...@@ -394,5 +394,25 @@ $$
$$ $$
$$ $$
= \mathbb{E}_ {s_{0:t},a_{0:(t-1)}}[b(s_t) \sum_a \pi_{\theta}(a_t|s_t)\frac{\nabla_{\theta}\pi_{\theta}(a_t|s_t)}{\pi_{\theta}(a_t|s_t)}] \quad \text{(将期望展开 + test对数函数求导)} = \mathbb{E}_ {s_{0:t},a_{0:(t-1)}}[b(s_t) \sum_{a_t} \pi_{\theta}(a_t|s_t)\frac{\nabla_{\theta}\pi_{\theta}(a_t|s_t)}{\pi_{\theta}(a_t|s_t)}] \quad \text{(将期望展开 + 对数函数求导)}
$$
$$
= \mathbb{E}_ {s_{0:t},a_{0:(t-1)}}[b(s_t) \sum_{a_t} \nabla_{\theta}\pi_{\theta}(a_t|s_t)]
$$
$$
= \mathbb{E}_ {s_{0:t},a_{0:(t-1)}}[b(s_t) \nabla_{\theta} \sum_{a_t} \pi_{\theta}(a_t|s_t)]
$$
$$
= \mathbb{E}_ {s_{0:t},a_{0:(t-1)}}[b(s_t) \nabla_{\theta} 1]
$$
$$
= \mathbb{E}_ {s_{0:t},a_{0:(t-1)}}[b(s_t) \cdot 0]
$$
$$
= 0。
$$ $$
\ No newline at end of file
Markdown is supported
0% .
You are about to add 0 people to the discussion. Proceed with caution.
先完成此消息的编辑!
想要评论请 注册