[Speed] Some operation benchmark on Eigen Tensor
Created by: dzhwinter
AFAIK, some operation on Eigen Tensor is quite slow. Please be careful if you use the broadcast
, chip
operations.
Here is a detail list of many operators.
iterations ns/op
BM_algebraicFunc/10 500000 3310 30.21 MFlops/s
BM_algebraicFunc/80 500000 3764 1700.24 MFlops/s
BM_algebraicFunc/640 200000 14676 27908.66 MFlops/s
BM_algebraicFunc/4K 2000 845890 29554.65 MFlops/s
BM_broadcasting/10 500000 3357 29.79 MFlops/s
BM_broadcasting/80 500000 3375 1895.82 MFlops/s
BM_broadcasting/640 500000 4420 92663.99 MFlops/s
BM_broadcasting/4K 10000 267284 93533.46 MFlops/s
BM_coeffWiseOp/10 500000 3266 91.83 MFlops/s
BM_coeffWiseOp/80 500000 3321 5780.22 MFlops/s
BM_coeffWiseOp/640 200000 14623 84028.95 MFlops/s
BM_coeffWiseOp/4K 2000 850300 88204.08 MFlops/s
BM_colChip/10 500000 4011 2.49 MFlops/s
BM_colChip/80 500000 3919 20.41 MFlops/s
BM_colChip/640 500000 3197 200.15 MFlops/s
BM_colChip/4K 500000 3233 1546.26 MFlops/s
BM_colReduction/10 500000 3478 28.75 MFlops/s
BM_colReduction/80 200000 11161 573.42 MFlops/s
BM_colReduction/640 200000 7870 52042.19 MFlops/s
BM_colReduction/4K 5000 308427 81056.28 MFlops/s
BM_contraction_64xNxN/10 500000 7090 1805.14 MFlops/s
BM_contraction_64xNxN/80 100000 15501 52845.46 MFlops/s
BM_contraction_64xNxN/640 50000 46078 1137805.85 MFlops/s
BM_contraction_64xNxN/4K 2000 897852 3564061.12 MFlops/s
BM_contraction_Nx64xN/10 200000 13725 932.55 MFlops/s
BM_contraction_Nx64xN/80 200000 14890 55013.27 MFlops/s
BM_contraction_Nx64xN/640 100000 21537 2434306.21 MFlops/s
BM_contraction_Nx64xN/4K 2000 1181087 2709366.40 MFlops/s
BM_contraction_NxNx64/10 200000 8660 1477.96 MFlops/s
BM_contraction_NxNx64/80 100000 15477 52927.86 MFlops/s
BM_contraction_NxNx64/640 50000 46470 1128214.28 MFlops/s
BM_contraction_NxNx64/4K 5000 564143 5672317.32 MFlops/s
BM_contraction_NxNxN/10 200000 8654 231.08 MFlops/s
BM_contraction_NxNxN/80 100000 15638 65479.29 MFlops/s
BM_contraction_NxNxN/640 20000 98106 5344083.13 MFlops/s
BM_contraction_NxNxN/4K 50 42695967 5855353.87 MFlops/s
BM_convolution_1x7/128 100000 16215 13482.31 MFlops/s
BM_convolution_1x7/1K 20000 91143 160121.78 MFlops/s
BM_convolution_1x7/4K 1000 2009086 173999.44 MFlops/s
BM_convolution_4x7/128 1000000 1633 522961.93 MFlops/s
BM_convolution_4x7/1K 1000000 1565 37174331.84 MFlops/s
BM_convolution_4x7/4K 1000000 1614 865610908.25 MFlops/s
BM_convolution_64x7/128 10000 106848 66498.74 MFlops/s
BM_convolution_64x7/1K 5000 730637 1199713.33 MFlops/s
BM_convolution_64x7/4K 100 16002777 1380461.57 MFlops/s
BM_convolution_7x1/128 200000 11202 19515.29 MFlops/s
BM_convolution_7x1/1K 50000 44060 331224.30 MFlops/s
BM_convolution_7x1/4K 2000 861338 405856.86 MFlops/s
BM_convolution_7x4/128 100000 18575 45973.54 MFlops/s
BM_convolution_7x4/1K 20000 84771 686610.84 MFlops/s
BM_convolution_7x4/4K 1000 1775696 787004.51 MFlops/s
BM_convolution_7x64/128 50000 40714 174516.31 MFlops/s
BM_convolution_7x64/1K 5000 625639 1401054.53 MFlops/s
BM_convolution_7x64/4K 100 14568089 1516411.54 MFlops/s
BM_fullReduction/10 500000 3428 29.17 MFlops/s
BM_fullReduction/80 500000 3428 1866.83 MFlops/s
BM_fullReduction/640 200000 7429 55135.02 MFlops/s
BM_fullReduction/4K 10000 276669 90360.50 MFlops/s
BM_memcpy/10 500000 3919 25.52 MFlops/s
BM_memcpy/80 500000 3847 1663.29 MFlops/s
BM_memcpy/640 500000 6807 60169.97 MFlops/s
BM_memcpy/4K 5000 564178 44312.24 MFlops/s
BM_padding/10 500000 3332 30.01 MFlops/s
BM_padding/80 500000 3349 1910.98 MFlops/s
BM_padding/640 500000 6592 62132.65 MFlops/s
BM_padding/4K 5000 577463 43292.77 MFlops/s
BM_random/10 500000 3236 30.90 MFlops/s
BM_random/80 500000 3269 1957.55 MFlops/s
BM_random/640 500000 3994 102551.78 MFlops/s
BM_random/4K 10000 266284 93884.58 MFlops/s
BM_rowChip/10 500000 3328 3.00 MFlops/s
BM_rowChip/80 500000 3220 24.84 MFlops/s
BM_rowChip/640 500000 3217 198.89 MFlops/s
BM_rowChip/4K 500000 3269 1529.47 MFlops/s
BM_rowReduction/10 500000 3317 30.14 MFlops/s
BM_rowReduction/80 500000 4868 1314.62 MFlops/s
BM_rowReduction/640 50000 50880 8050.24 MFlops/s
BM_rowReduction/4K 5000 352852 70851.06 MFlops/s
BM_shuffling/10 500000 3224 31.01 MFlops/s
BM_shuffling/80 500000 3223 1985.13 MFlops/s
BM_shuffling/640 200000 13123 31211.57 MFlops/s
BM_shuffling/4K 500 3140372 7960.84 MFlops/s
BM_slicing/10 200000 13020 7.68 MFlops/s
BM_slicing/80 200000 13710 466.81 MFlops/s
BM_slicing/640 100000 16364 25030.39 MFlops/s
BM_slicing/4K 2000 717139 34860.74 MFlops/s
BM_striding/10 500000 3483 28.70 MFlops/s
BM_striding/80 500000 4039 1584.26 MFlops/s
BM_striding/640 500000 3593 113968.07 MFlops/s
BM_striding/4K 5000 320700 77954.47 MFlops/s
BM_transcendentalFunc/10 500000 3175 31.49 MFlops/s
BM_transcendentalFunc/80 500000 3211 1992.60 MFlops/s
BM_transcendentalFunc/640 200000 14442 28359.91 MFlops/s
BM_transcendentalFunc/4K 2000 846544 29531.83 MFlops/s
BM_typeCasting/10 500000 3192 31.33 MFlops/s
BM_typeCasting/80 500000 3289 1945.38 MFlops/s
BM_typeCasting/640 500000 6875 59571.46 MFlops/s
BM_typeCasting/4K 5000 604207 41376.54 MFlops/s