未验证 提交 07c729aa 编写于 作者: J Jiabin Yang 提交者: GitHub

[Eager] Fix sharding in eager (#44271)

* fix sharding in eager

* support eager sharding
上级 d6d60cbc
......@@ -210,9 +210,10 @@ class GroupShardedStage2(nn.Layer):
scale=self._world_size_scaling)
# Scale grads of params
for param in self._trainable_params:
if param.name in self._param_grads and param.grad is not None:
param.grad.scale_(scale=self._world_size_scaling)
with paddle.no_grad():
for param in self._trainable_params:
if param.name in self._param_grads and param.grad is not None:
param.grad.scale_(scale=self._world_size_scaling)
# param._reset_grad_inplace_version(True)
# Scale grads of master params with offload strategy
......
Markdown is supported
0% .
You are about to add 0 people to the discussion. Proceed with caution.
先完成此消息的编辑!
想要评论请 注册