Created by: qingqing01
Fix https://github.com/PaddlePaddle/Paddle/issues/11739
In the PyramidBox model, the time:
- 10 mini-batch time: conv2d_transpose: 1788.08-> 63.9877 conv2d_transpose_grad: 1308.85 -> 610.87
- the total time of all ops: 19.60 -> 16.36 : speed 16.5%