未验证 提交 288664c1 编写于 作者: W whs 提交者: GitHub

Merge pull request #799 from wanghaoshuang/fix_policy_gradient

Adapt usage of reduce_mean to the latest fluid API.
......@@ -45,7 +45,7 @@ class PolicyGradient:
label=acts) # this is negative log of chosen action
neg_log_prob_weight = fluid.layers.elementwise_mul(x=neg_log_prob, y=vt)
loss = fluid.layers.reduce_mean(
x=neg_log_prob_weight) # reward guided loss
neg_log_prob_weight) # reward guided loss
sgd_optimizer = fluid.optimizer.SGD(self.lr)
sgd_optimizer.minimize(loss)
......
Markdown is supported
0% .
You are about to add 0 people to the discussion. Proceed with caution.
先完成此消息的编辑!
想要评论请 注册