Fix memory leak in zero2 contiguous gradients (#3306)

No usage of extra_large_param_to_reduce if contiguous_gradients is False. It keeps reference of the param for the lifetime of the application. Co-authored-by: N Olatunji Ruwase <olruwase@microsoft.com> Co-authored-by: N Logan Adams <114770087+loadams@users.noreply.github.com>

Fix memory leak in zero2 contiguous gradients (#3306)
No usage of extra_large_param_to_reduce if contiguous_gradients is False. It keeps reference of the param for the lifetime of the application. Co-authored-by: N Olatunji Ruwase <olruwase@microsoft.com> Co-authored-by: N Logan Adams <114770087+loadams@users.noreply.github.com>
01d17492 · hablb · GitHub · 0e357666 · 01d17492
显示空白变更内容
内联并排

Showing with 8 addition and 8 deletion

deepspeed/runtime/zero/stage_1_and_2.py deepspeed/runtime/zero/stage_1_and_2.py +8 -8

未找到文件。
--- a/deepspeed/runtime/zero/stage_1_and_2.py
+++ b/deepspeed/runtime/zero/stage_1_and_2.py
@@ -839,10 +839,10 @@ class DeepSpeedZeroOptimizer(ZeROOptimizer):
            Gradient computed twice for this partition. \
            Multiple gradient reduction is currently not supported"

+        if self.contiguous_gradients:
            if param.numel() > self.reduce_bucket_size:
                self.extra_large_param_to_reduce = param
-
-        elif self.contiguous_gradients:
+            else:
                # keeping the gradients contiguous to prevent memory fragmentation, and avoid flattening
                new_grad_tensor = self.ipg_buffer[self.ipg_index].narrow(0, self.elements_in_ipg_bucket, param.numel())
                new_grad_tensor.copy_(param.grad.view(-1))