diff --git a/docs/10.md b/docs/10.md
index 3c964c4918964032022b9917d79819714c097243..a3432bf8400e800ce27da68a4c006de6263ceb40 100644
--- a/docs/10.md
+++ b/docs/10.md
@@ -604,9 +604,9 @@ $\text{gradients = loss.gradients(loss, variables)}$
5. J. Schulman et al, "Trust region policy optimization," *ICML*, 2015.
-## A TRPO 证明(TRPO Proofs)
+## A. TRPO 证明(TRPO Proofs)
-### A.1 奖励调整(Reward Shaping)
+### A.1 奖励调整(Reward Shaping)
这里我们证明[引理 5.1](#lemma51)。