Support fp32 gradaccum for bf16 model (#2566)

* allow bf16 model with fp32 gradient accumulation datatype * allow fp32 gradient accumulation and bfloat16 model in amp mode * alternative fix for grad accumulation type mismatch. In the case of zero optimizer we should have grad accum type == model data type Co-authored-by: N Olatunji Ruwase <olruwase@microsoft.com>

Support fp32 gradaccum for bf16 model (#2566)
* allow bf16 model with fp32 gradient accumulation datatype * allow fp32 gradient accumulation and bfloat16 model in amp mode * alternative fix for grad accumulation type mismatch. In the case of zero optimizer we should have grad accum type == model data type Co-authored-by: N Olatunji Ruwase <olruwase@microsoft.com>
06938835 · Ma, Guokai · GitHub · 2d8f3f56 · 06938835
隐藏空白更改
内联并排

Showing with 1 addition and 1 deletion

deepspeed/runtime/engine.py deepspeed/runtime/engine.py +1 -1

未找到文件。
--- a/deepspeed/runtime/engine.py
+++ b/deepspeed/runtime/engine.py
@@ -807,7 +807,7 @@ class DeepSpeedEngine(Module):
            model_dtype = torch.bfloat16

        if self._config.grad_accum_dtype == None:
-            if model_dtype == torch.bfloat16:
+            if model_dtype == torch.bfloat16 and not self.zero_optimization():
                grad_accum_dtype = torch.float32
            else:
                grad_accum_dtype = model_dtype