Update adam.py (#1278)

Make add operation inplace. Without it momentum decays to zero and training has no effect on corresponding parameters

Update adam.py (#1278)
Make add operation inplace. Without it momentum decays to zero and training has no effect on corresponding parameters
00320a9b · Denis Tarasov · GitHub · adc21a4d · 00320a9b
隐藏空白更改
内联并排

Showing with 1 addition and 1 deletion

deepspeed/runtime/fp16/onebit/adam.py deepspeed/runtime/fp16/onebit/adam.py +1 -1

未找到文件。
--- a/deepspeed/runtime/fp16/onebit/adam.py
+++ b/deepspeed/runtime/fp16/onebit/adam.py
@@ -205,7 +205,7 @@ class OnebitAdam(torch.optim.Optimizer):
                    if 'non_freeze' in group.keys() and group['non_freeze'] is True:
                        dist.all_reduce(grad)
                        grad.mul_(1 / dist.get_world_size())
-                        exp_avg.mul_(beta1).add(1 - beta1, grad)
+                        exp_avg.mul_(beta1).add_(1 - beta1, grad)
                        exp_avg_sq.mul_(beta2).addcmul_(1 - beta2, grad, grad)
                        grad = None
                    else: