Performance regression of box_coder op (#15618) · Issue · PaddlePaddle / Paddle

Performance regression of box_coder op

Created by: hshen14

Recently, we used PaddlePaddle new commit (1a252f4b in 1/30) and ran SSD-MobileNet. Seems there is performance regression in box_coder op.

Before the upgrading, the profile is:

Event                      Calls       Total       Min.        Max.        Ave.        Ratio.      
thread0::conv2d            2350        37918.7     0.091484    50.9656     16.1356     0.906646    
thread0::multiclass_nms    50          1262.89     24.996      26.1822     25.2578     0.0301961   
thread0::prior_box         300         980.794     0.144806    11.5341     3.26931     0.023451    
thread0::reshape2          1200        906.852     0.004585    8.57195     0.75571     0.0216831   
thread0::transpose2        650         277.378     0.027228    2.43797     0.426735    0.00663217  
thread0::softmax           50          221.55      4.39363     4.63009     4.43099     0.00529731  
thread0::concat            200         156.415     0.071128    2.49121     0.782075    0.00373992  
thread0::box_coder         50          68.9811     1.3714      1.39247     1.37962     0.00164936  
thread0::flatten2          600         12.1264     0.012349    0.059559    0.0202107   0.000289946 
thread0::fetch             50          11.177      0.216979    0.23537     0.22354     0.000267245 
thread0::assign_value      600         5.6246      0.004578    0.02607     0.00937434  0.000134486 
thread0::feed              50          0.559388    0.01022     0.012291    0.0111878   1.33751e-05

After the upgrading, the profile is

Event                      Calls       Total       Min.        Max.        Ave.        Ratio.      
thread0::conv2d            2350        37742.3     0.086507    50.8266     16.0605     0.861045    
thread0::box_coder         50          2216.7      44.2231     44.5352     44.334      0.0505714   
thread0::multiclass_nms    50          1311.86     26.0629     27.1475     26.2373     0.0299286   
thread0::prior_box         300         979.043     0.126828    11.6266     3.26348     0.0223357   
thread0::reshape2          1200        911.08      0.005721    8.56092     0.759234    0.0207852   
thread0::transpose2        650         255.189     0.025399    2.42074     0.392599    0.00582184  
thread0::softmax           50          226.04      4.399       4.72333     4.52081     0.00515685  
thread0::concat            200         159.471     0.073097    2.49975     0.797355    0.00363814  
thread0::flatten2          600         14.4118     0.015236    0.057221    0.0240196   0.000328788 
thread0::fetch             50          10.9862     0.209115    0.238141    0.219724    0.000250637 
thread0::assign_value      600         5.58148     0.004008    0.036771    0.00930247  0.000127335 
thread0::feed              50          0.447534    0.007618    0.019401    0.00895068  1.021e-05

We observed the recent commit for box coder by @jerrywgz. Could you please take a look and double confirmed the issue? You may use the script eval.py under fluid/PaddleCV/object_detection and enable the profiler to dump the performance profiling information. Thanks.

Add @luotao1.

PaddlePaddle / Paddle 接近 2 年 前同步成功

Performance regression of box_coder op

PaddlePaddle / Paddle
接近 2 年前同步成功