提交 597c4b1a 编写于 作者: X xiaowei_xing

test

上级 c2b57ecc
......@@ -198,4 +198,13 @@ $$
$$
\nabla_{\theta}\log P(\tau^{(i)};\theta) = \nabla_{\theta}\log [\underbrace{\mu(s_0)}_ {\text{initial state distribution}} \prod_{t=0}^{T-1} \underbrace{\pi_{\theta}(a_t|s-t)}_ {\text{policy}} \underbrace{P(s_{t+1}|s_t,a_t)}_{\text{dynamics model}}]
\tag{8}
$$
$$
= \nabla_{\theta} [\log \mu(s_0) + \sum_{t=0}^{T-1}\log \pi_{\theta}(a_t|s_t) + \log P(s_{t+1}|s_t,a_t)]
\tag{9}
$$
$$
\sum_{t=0}^{T-1} \underbrace{\nabla_{\theta}\log \pi_{\theta}(a_t|s_t)}_{\text{no dynamics model required!}}。
$$
\ No newline at end of file
Markdown is supported
0% .
You are about to add 0 people to the discussion. Proceed with caution.
先完成此消息的编辑!
想要评论请 注册