test

a3b14ee8 · xiaowei_xing · ae3a0608 · a3b14ee8
隐藏空白更改
内联并排

Showing with 7 addition and 1 deletion

docs/8&9.md docs/8&9.md +7 -1

未找到文件。
--- a/docs/8&9.md
+++ b/docs/8&9.md
@@ -318,5 +318,11 @@ $$
 离散动作空间中，我们通常用软最大值函数（softmax function）来参数化策略：
 $$
-\pi_{\theta}(a|s)=\frac{e^{\phi(s,a)^{\text{T}}\theta}}{sum_{a'} e^{\phi(s,a')^{\text{T}}\theta}}。
+\pi_{\theta}(a|s)=\frac{e^{\phi(s,a)^{\text{T}}\theta}}{\sum_{a'} e^{\phi(s,a')^{\text{T}}\theta}}。
+$$
+则评价函数变为：
+$$
+\nabla_{\theta} \log \pi_{\theta}(a|s) = \nabla_{\theta} [\phi(s,a)^{\text{T}}\theta - \log \sum_{a'}e^{\phi(s,a')^{\text{T}}\theta}]
 $$
\ No newline at end of file