提交 c2b57ecc 编写于 作者: X xiaowei_xing

test

上级 c67c9640
......@@ -180,7 +180,7 @@ $$
$$
$$
=\mathbb{\tau \sim \pi_{\theta}}[R(\tau)\nabla_{\theta} \log P(\tau;\theta)]。
=\mathbb{E}_ {\tau \sim \pi_{\theta}}[R(\tau)\nabla_{\theta} \log P(\tau;\theta)]。
\tag{6}
$$
......@@ -196,6 +196,6 @@ $$
其次,计算 $\nabla_{\theta}\log P(\tau^{(i)};\theta)$ 比直接计算 $P(\tau^{(i)};\theta)$ 容易:
$$
\nabla_{\theta}\log P(\tau^{(i)};\theta) = \nabla_{\theta}\log []
\nabla_{\theta}\log P(\tau^{(i)};\theta) = \nabla_{\theta}\log [\underbrace{\mu(s_0)}_ {\text{initial state distribution}} \prod_{t=0}^{T-1} \underbrace{\pi_{\theta}(a_t|s-t)}_ {\text{policy}} \underbrace{P(s_{t+1}|s_t,a_t)}_{\text{dynamics model}}]
\tag{8}
$$
\ No newline at end of file
Markdown is supported
0% .
You are about to add 0 people to the discussion. Proceed with caution.
先完成此消息的编辑!
想要评论请 注册