提交 ab581c0d 编写于 作者: X xiaowei_xing

test

上级 29b8a86d
......@@ -84,4 +84,8 @@ $$
$$
= \mathbb{E}_ {\tau\sim\pi_{\theta}} [\sum_{t=1}^{T}(\nabla_{\theta}(\log\pi_{\theta}(a_t|s_t))(\sum_{t=1}^{T}\gamma^t r(s_t,a_t)))]
$$
$$
\approx \frac{1}{N} \sum_{i=1}^{N} \sum_{t=1}^{T} (\nabla_{\theta} (\log\pi_{\theta}(a_{i,t|}|s_{i,t}))(\sum_{t=1}^{T}\gamma^t r(s_{i,t},a_{i,t})))
$$
\ No newline at end of file
Markdown is supported
0% .
You are about to add 0 people to the discussion. Proceed with caution.
先完成此消息的编辑!
想要评论请 注册