stage3.py: do not scale if gradient_predivide_factor is 1.0 (#3630)
this change also aligns with the logic before reduce_scatter_coalesced
Co-authored-by: NOlatunji Ruwase <olruwase@microsoft.com>
Showing
想要评论请 注册 或 登录
this change also aligns with the logic before reduce_scatter_coalesced
Co-authored-by: NOlatunji Ruwase <olruwase@microsoft.com>