Created by: zhangting2020
Performance optimization PR types
OPs PR changes
Describe
Using IndexList instead of arrays of indices can speed up CPU and GPU performance. This PR use it to improve performance of dot op. For more information, please refer to #25132
performace
GPU: v100, cuda10
op | input shape | before | after | speed up |
---|---|---|---|---|
dot | [1000, 1000] | 0.210704 ms | 0.125058 ms | 1.7x |