提交 16316936 编写于 作者: X xiaowei_xing

test

上级 882abf27
......@@ -345,4 +345,8 @@ $$
$$
= \phi(s,a) - \mathbb{E}_ {a'\sim\pi_{\theta}(a'|s)}[\phi(s,a')]
$$
\ No newline at end of file
$$
## 6.2 连续动作空间:高斯策略(Continuous Action Space: Gaussian Policy)
对于连续动作空间,一个常用的选择是高斯策略:$a \sim \cal{N} (\mu(s),\sigma^2)$。
\ No newline at end of file
Markdown is supported
0% .
You are about to add 0 people to the discussion. Proceed with caution.
先完成此消息的编辑!
想要评论请 注册