提交 98ff6fba 编写于 作者: X xiaowei_xing

test

上级 9bba28ec
...@@ -604,9 +604,9 @@ $\text{gradients = loss.gradients(loss, variables)}$ ...@@ -604,9 +604,9 @@ $\text{gradients = loss.gradients(loss, variables)}$
5. <span id="ref5">J. Schulman et al, "Trust region policy optimization," *ICML*, 2015.</span> 5. <span id="ref5">J. Schulman et al, "Trust region policy optimization," *ICML*, 2015.</span>
## A TRPO 证明(TRPO Proofs) ## A. TRPO 证明(TRPO Proofs)
<span id="lemma51p">### A.1 奖励调整(Reward Shaping)</span> ### <span id="lemma51p">A.1 奖励调整(Reward Shaping)</span>
这里我们证明[引理 5.1](#lemma51) 这里我们证明[引理 5.1](#lemma51)
......
Markdown is supported
0% .
You are about to add 0 people to the discussion. Proceed with caution.
先完成此消息的编辑!
想要评论请 注册