Created by: zhangting2020
Performance optimization PR types
OPs PR changes
Describe
IndexList in Eigen is used to encode a set of Tensor dimensions/indices. The indices in the list can be known at compile time or at runtime. A mix of static and dynamic indices can also be provided if needed. The tensor code will attempt to take advantage of the indices that are known at compile time to optimize the code it generates. Using IndexList instead of arrays of indices can speed up CPU and GPU performance.
Note:
This functionality requires a c++11 compliant compiler. If the compiler is older we need to use arrays of indices instead. That's why the EIGEN_HAS_INDEX_LIST
is used in the code.
Performance
CPU:
op | input shape | before | after | speed up |
---|---|---|---|---|
instance_norm | [1, 64, 128, 128] | 41.2712 ms | 3.21222 ms | 13x |
instance_norm_grad | [1, 64, 128, 128] | 149.14 ms | 11.949 ms | 12x |
instance_norm | [1, 128, 64, 64] | 20.6193 ms | 1.61026 ms | 13x |
instance_norm_grad | [1, 128, 64, 64] | 74.4767 ms | 5.19748 ms | 14x |
instance_norm | [1, 256, 32, 32] | 10.308 ms | 0.821658 ms | 13x |
instance_norm_grad | [1, 256, 32, 32] | 37.1751 ms | 2.60926 ms | 14x |