未验证 提交 07c729aa 编写于 作者: J Jiabin Yang 提交者: GitHub

[Eager] Fix sharding in eager (#44271)

* fix sharding in eager

* support eager sharding
上级 d6d60cbc
...@@ -210,6 +210,7 @@ class GroupShardedStage2(nn.Layer): ...@@ -210,6 +210,7 @@ class GroupShardedStage2(nn.Layer):
scale=self._world_size_scaling) scale=self._world_size_scaling)
# Scale grads of params # Scale grads of params
with paddle.no_grad():
for param in self._trainable_params: for param in self._trainable_params:
if param.name in self._param_grads and param.grad is not None: if param.name in self._param_grads and param.grad is not None:
param.grad.scale_(scale=self._world_size_scaling) param.grad.scale_(scale=self._world_size_scaling)
......
Markdown is supported
0% .
You are about to add 0 people to the discussion. Proceed with caution.
先完成此消息的编辑!
想要评论请 注册