Created by: chenwhql
PR types
New features
PR changes
Others
Describe
This PR support SelelctedRows allreduce in multi-cards imperative mode.
Before this PR, if you set sparse gradients update for Embedding parameters in multi-cards imperative mode, it will throw error like:
----------------------
Error Message Summary:
----------------------
InvalidArgumentError: The 'shape' in ReshapeOp is invalid. The input tensor X'size must be equal to the capacity of 'shape'. But received X's shape = [1000, 10], X's size = 10000, 'shape' is [120], the capacity of 'shape' is 120.
[Hint: Expected capacity == in_size, but received capacity:120 != in_size:10000.] at (/work/paddle/paddle/fluid/operators/reshape_op.cc:206)
This PR fix this problem.
A brief introduction to the implementation ideas:
- The differnece between
LoDTensor
andSelectedRows
- The allreduce of
LoDTensor
- The allreduce of
SelectedRows
NOTE:
- Because
ncclBroadcast
is a API after NCCL verision 2.2.12, so for using older verison NCCL, we don't support SelectedRows allreduce, and we will give related error hint
----------------------
Error Message Summary:
----------------------
UnimplementedError: Imperative SelectedRows allreduce is not supported when paddle is compiled with NCCL verison lower than v2.2.12. You can set is_sparse=False for the Layer containing this argument, such as Embedding(is_sparse=False). at (/workspace/Paddle/paddle/fluid/pybind/imperative.cc:781)