未验证 提交 01d17492 编写于 作者: H hablb 提交者: GitHub

Fix memory leak in zero2 contiguous gradients (#3306)

No usage of extra_large_param_to_reduce if contiguous_gradients is False.
It keeps reference of the param for the lifetime of the application.
Co-authored-by: NOlatunji Ruwase <olruwase@microsoft.com>
Co-authored-by: NLogan Adams <114770087+loadams@users.noreply.github.com>
上级 0e357666
......@@ -839,14 +839,14 @@ class DeepSpeedZeroOptimizer(ZeROOptimizer):
Gradient computed twice for this partition. \
Multiple gradient reduction is currently not supported"
if param.numel() > self.reduce_bucket_size:
self.extra_large_param_to_reduce = param
elif self.contiguous_gradients:
# keeping the gradients contiguous to prevent memory fragmentation, and avoid flattening
new_grad_tensor = self.ipg_buffer[self.ipg_index].narrow(0, self.elements_in_ipg_bucket, param.numel())
new_grad_tensor.copy_(param.grad.view(-1))
param.grad.data = new_grad_tensor.data.view_as(param.grad)
if self.contiguous_gradients:
if param.numel() > self.reduce_bucket_size:
self.extra_large_param_to_reduce = param
else:
# keeping the gradients contiguous to prevent memory fragmentation, and avoid flattening
new_grad_tensor = self.ipg_buffer[self.ipg_index].narrow(0, self.elements_in_ipg_bucket, param.numel())
new_grad_tensor.copy_(param.grad.view(-1))
param.grad.data = new_grad_tensor.data.view_as(param.grad)
self.elements_in_ipg_bucket += param.numel()
......
Markdown is supported
0% .
You are about to add 0 people to the discussion. Proceed with caution.
先完成此消息的编辑!
想要评论请 注册