提交 e78d9faf 编写于 作者: X xiaowei_xing

test

上级 4dea6021
......@@ -153,4 +153,16 @@ $$
$$
= \mathbb{E}_ {s_{0:t},a_{0:(t-1)}} [\mathbb{E}_ {s_{(t+1):T},a_{t:(T-1)}} [\nabla_{\theta}\log \pi_{\theta}(a_t|s_t)b(s_t)]]
$$
$$
= \mathbb{E}_ {s_{0:t},a_{0:(t-1)}} [b(s_t) \mathbb{E}_ {s_{(t+1):T},a_{t:(T-1)}}[\nabla_{\theta}\log \pi_{\theta}(a_t|s_t)]]
$$
$$
= \mathbb{E}_ {s_{0:t},a_{0:(t-1)}} [b(s_t) \mathbb{E}_{a_t}[\nabla_{\theta}\log \pi_{\theta}(a_t|s_t)]]
$$
$$
= \mathbb{E}_ {s_{0:t},a_{0:(t-1)}} [b(s_t) \cdot 0] = 0。
$$
\ No newline at end of file
Markdown is supported
0% .
You are about to add 0 people to the discussion. Proceed with caution.
先完成此消息的编辑!
想要评论请 注册