未验证 提交 00320a9b 编写于 作者: D Denis Tarasov 提交者: GitHub

Update adam.py (#1278)

Make add operation inplace. Without it momentum decays to zero and training has no effect on corresponding parameters
上级 adc21a4d
......@@ -205,7 +205,7 @@ class OnebitAdam(torch.optim.Optimizer):
if 'non_freeze' in group.keys() and group['non_freeze'] is True:
dist.all_reduce(grad)
grad.mul_(1 / dist.get_world_size())
exp_avg.mul_(beta1).add(1 - beta1, grad)
exp_avg.mul_(beta1).add_(1 - beta1, grad)
exp_avg_sq.mul_(beta2).addcmul_(1 - beta2, grad, grad)
grad = None
else:
......
Markdown is supported
0% .
You are about to add 0 people to the discussion. Proceed with caution.
先完成此消息的编辑!
想要评论请 注册