Performance Regression Caused by Tensor Modification. (#16248) · Issue · PaddlePaddle / Paddle

Performance Regression Caused by Tensor Modification.

Created by: xiaolil1

We found performance regression measured by single batch size and 28 cores in our regular test. We root-caused the regression was caused by the code change of "set_format" introduced by Tensor modification, which commit ID is dec9cf53. There are 5% regression for FP32, and over 10% regression for INT8 on CLX (below table):

INT8 ( BS = 1, Cores = 28)	ResNet-50	MobileNet	SSD-MobileNet
Current (dec9c)	236	494	169
Last (08c96)	265	536	185
Curr / Last	0.89	0.92	0.92

FP32 ( BS = 1, Cores = 28)	ResNet-50	MobileNet	SSD-MobileNet
Current (dec9c)	97	298	116
Last (08c96)	100	309	122
Curr / Last	0.97	0.97	0.95

The code implementation always creates the primitive descriptor in "set_format", which seems unnecessary and brings potential performance overhead. Moreover, “set_format” accepts the default data type FP32, however, the API should consider both FP32 and INT8. (e.g., conv_mkldnn_op.cc:632)

PaddlePaddle / Paddle 大约 1 年 前同步成功

Performance Regression Caused by Tensor Modification.

PaddlePaddle / Paddle
大约 1 年前同步成功