未验证 提交 7a132a9f 编写于 作者: S Stas Bekman 提交者: GitHub

port OVERFLOW log to ZeRO-2 (#1593)

上级 d14baad9
......@@ -1626,6 +1626,14 @@ class FP16_DeepSpeedZeroOptimizer(object):
prev_scale = self.loss_scale
self._update_scale(self.overflow)
if self.overflow:
if dist.get_rank() == 0:
logger.info(
"[deepscale] OVERFLOW! Rank {} Skipping step. Attempted loss scale: {}, "
"reducing to {}".format(dist.get_rank(),
prev_scale,
self.loss_scale))
see_memory_usage('After overflow before clearing gradients')
self.zero_grad()
if self.cpu_offload:
......
Markdown is supported
0% .
You are about to add 0 people to the discussion. Proceed with caution.
先完成此消息的编辑!
想要评论请 注册