未验证 提交 01d17492 编写于 作者: H hablb 提交者: GitHub

Fix memory leak in zero2 contiguous gradients (#3306)

No usage of extra_large_param_to_reduce if contiguous_gradients is False.
It keeps reference of the param for the lifetime of the application.
Co-authored-by: NOlatunji Ruwase <olruwase@microsoft.com>
Co-authored-by: NLogan Adams <114770087+loadams@users.noreply.github.com>
上级 0e357666
......@@ -839,10 +839,10 @@ class DeepSpeedZeroOptimizer(ZeROOptimizer):
Gradient computed twice for this partition. \
Multiple gradient reduction is currently not supported"
if self.contiguous_gradients:
if param.numel() > self.reduce_bucket_size:
self.extra_large_param_to_reduce = param
elif self.contiguous_gradients:
else:
# keeping the gradients contiguous to prevent memory fragmentation, and avoid flattening
new_grad_tensor = self.ipg_buffer[self.ipg_index].narrow(0, self.elements_in_ipg_bucket, param.numel())
new_grad_tensor.copy_(param.grad.view(-1))
......
Markdown is supported
0% .
You are about to add 0 people to the discussion. Proceed with caution.
先完成此消息的编辑!
想要评论请 注册