Performance Regression Caused by Tensor Modification.
Created by: xiaolil1
We found performance regression measured by single batch size and 28 cores in our regular test. We root-caused the regression was caused by the code change of "set_format" introduced by Tensor modification, which commit ID is dec9cf53. There are 5% regression for FP32, and over 10% regression for INT8 on CLX (below table):
INT8 ( BS = 1, Cores = 28) | ResNet-50 | MobileNet | SSD-MobileNet |
---|---|---|---|
Current (dec9c) | 236 | 494 | 169 |
Last (08c96) | 265 | 536 | 185 |
Curr / Last | 0.89 | 0.92 | 0.92 |
FP32 ( BS = 1, Cores = 28) | ResNet-50 | MobileNet | SSD-MobileNet |
---|---|---|---|
Current (dec9c) | 97 | 298 | 116 |
Last (08c96) | 100 | 309 | 122 |
Curr / Last | 0.97 | 0.97 | 0.95 |
The code implementation always creates the primitive descriptor in "set_format", which seems unnecessary and brings potential performance overhead. Moreover, “set_format” accepts the default data type FP32, however, the API should consider both FP32 and INT8. (e.g., conv_mkldnn_op.cc:632)