diff --git a/docs/10.md b/docs/10.md index e5db4216338cf93ccf5c2534693a2f82ed6e283c..509e4144fa6c905d4c9bc65a0ad066a69420565c 100644 --- a/docs/10.md +++ b/docs/10.md @@ -200,9 +200,11 @@ $$ $$ $$ -= 2\mathbb{E} [(\nabla_{\theta} \log \pi_{\theta}(\tau))^2 b] - 2\mathbb{E} [(\nabla_{\theta} \log \pi_{\theta}(\tau))^2 r(tau)] = 0, += 2\mathbb{E} [(\nabla_{\theta} \log \pi_{\theta}(\tau))^2 b] - 2\mathbb{E} [(\nabla_{\theta} \log \pi_{\theta}(\tau))^2 r(\tau)] = 0, $$ $$ -b = \frac{\mathbb{E} [(\nabla_{\theta} \log \pi_{\theta}(\tau))^2 r(tau)]}{\mathbb{E} [(\nabla_{\theta} \log \pi_{\theta}(\tau))^2]}。 -$$ \ No newline at end of file +b = \frac{\mathbb{E} [(\nabla_{\theta} \log \pi_{\theta}(\tau))^2 r(\tau)]}{\mathbb{E} [(\nabla_{\theta} \log \pi_{\theta}(\tau))^2]}。 +$$ + +## 3. 离线策略策略梯度(Off Policy Policy Gradient) \ No newline at end of file