Created by: LutaoChu
PR types
Performance optimization
PR changes
OPs
Describe
Optimize argsort Op performance on GPU when the input size is equal to the length of the ‘axis’ dimension.
Argsort performance benchmark:
Conclusion
- After optimization, the performance is increased by 34 times for forward ascending sort, 31 times for descending sort, and 10 times for gradient calculation.
- Forward speed is 2.3 times that of Pytorch