未验证 提交 52907a66 编写于 作者: 郭叶军's avatar 郭叶军 提交者: GitHub

stage3.py: do not scale if gradient_predivide_factor is 1.0 (#3630)

this change also aligns with the logic before reduce_scatter_coalesced
Co-authored-by: NOlatunji Ruwase <olruwase@microsoft.com>
上级 49a73549
......@@ -1122,7 +1122,8 @@ class DeepSpeedZeroOptimizer_Stage3(ZeROOptimizer):
grad_partitions_for_rank = reduce_scatter_coalesced(full_grads_for_rank, self.dp_process_group)
if self.postscale_gradients and self.gradient_predivide_factor != dist.get_world_size(self.dp_process_group):
if self.postscale_gradients and self.gradient_predivide_factor != 1.0 and self.gradient_predivide_factor != dist.get_world_size(
self.dp_process_group):
grad_partitions_for_rank = [g.mul(self.gradient_predivide_factor) for g in grad_partitions_for_rank]
if self.communication_data_type != self.dtype:
......
Markdown is supported
0% .
You are about to add 0 people to the discussion. Proceed with caution.
先完成此消息的编辑!
想要评论请 注册