提交 98ff6fba 编写于 作者: X xiaowei_xing

test

上级 9bba28ec
......@@ -604,9 +604,9 @@ $\text{gradients = loss.gradients(loss, variables)}$
5. <span id="ref5">J. Schulman et al, "Trust region policy optimization," *ICML*, 2015.</span>
## A TRPO 证明(TRPO Proofs)
## A. TRPO 证明(TRPO Proofs)
<span id="lemma51p">### A.1 奖励调整(Reward Shaping)</span>
### <span id="lemma51p">A.1 奖励调整(Reward Shaping)</span>
这里我们证明[引理 5.1](#lemma51)
......
Markdown is supported
0% .
You are about to add 0 people to the discussion. Proceed with caution.
先完成此消息的编辑!
想要评论请 注册