Performance regression of box_coder op
Created by: hshen14
Recently, we used PaddlePaddle new commit (1a252f4b in 1/30) and ran SSD-MobileNet. Seems there is performance regression in box_coder op.
Before the upgrading, the profile is:
Event Calls Total Min. Max. Ave. Ratio.
thread0::conv2d 2350 37918.7 0.091484 50.9656 16.1356 0.906646
thread0::multiclass_nms 50 1262.89 24.996 26.1822 25.2578 0.0301961
thread0::prior_box 300 980.794 0.144806 11.5341 3.26931 0.023451
thread0::reshape2 1200 906.852 0.004585 8.57195 0.75571 0.0216831
thread0::transpose2 650 277.378 0.027228 2.43797 0.426735 0.00663217
thread0::softmax 50 221.55 4.39363 4.63009 4.43099 0.00529731
thread0::concat 200 156.415 0.071128 2.49121 0.782075 0.00373992
thread0::box_coder 50 68.9811 1.3714 1.39247 1.37962 0.00164936
thread0::flatten2 600 12.1264 0.012349 0.059559 0.0202107 0.000289946
thread0::fetch 50 11.177 0.216979 0.23537 0.22354 0.000267245
thread0::assign_value 600 5.6246 0.004578 0.02607 0.00937434 0.000134486
thread0::feed 50 0.559388 0.01022 0.012291 0.0111878 1.33751e-05
After the upgrading, the profile is
Event Calls Total Min. Max. Ave. Ratio.
thread0::conv2d 2350 37742.3 0.086507 50.8266 16.0605 0.861045
thread0::box_coder 50 2216.7 44.2231 44.5352 44.334 0.0505714
thread0::multiclass_nms 50 1311.86 26.0629 27.1475 26.2373 0.0299286
thread0::prior_box 300 979.043 0.126828 11.6266 3.26348 0.0223357
thread0::reshape2 1200 911.08 0.005721 8.56092 0.759234 0.0207852
thread0::transpose2 650 255.189 0.025399 2.42074 0.392599 0.00582184
thread0::softmax 50 226.04 4.399 4.72333 4.52081 0.00515685
thread0::concat 200 159.471 0.073097 2.49975 0.797355 0.00363814
thread0::flatten2 600 14.4118 0.015236 0.057221 0.0240196 0.000328788
thread0::fetch 50 10.9862 0.209115 0.238141 0.219724 0.000250637
thread0::assign_value 600 5.58148 0.004008 0.036771 0.00930247 0.000127335
thread0::feed 50 0.447534 0.007618 0.019401 0.00895068 1.021e-05
We observed the recent commit for box coder by @jerrywgz. Could you please take a look and double confirmed the issue? You may use the script eval.py under fluid/PaddleCV/object_detection and enable the profiler to dump the performance profiling information. Thanks.
Add @luotao1.