提交 ae3a0608 编写于 作者: X xiaowei_xing

test

上级 4719ab38
......@@ -318,4 +318,5 @@ $$
离散动作空间中,我们通常用软最大值函数(softmax function)来参数化策略:
$$
\pi_{\theta}(a|s)=\frac{e^{\phi(s,a)^{\text{T}}\theta}}{sum_{a'} e^{\phi(s,a')^{\text{T}}\theta}}。
\ No newline at end of file
\pi_{\theta}(a|s)=\frac{e^{\phi(s,a)^{\text{T}}\theta}}{sum_{a'} e^{\phi(s,a')^{\text{T}}\theta}}。
$$
\ No newline at end of file
Markdown is supported
0% .
You are about to add 0 people to the discussion. Proceed with caution.
先完成此消息的编辑!
想要评论请 注册