conv_2d cuDNN operator slow
Created by: tonyyang-svail
During the benchmarking of https://github.com/dzhwinter/benchmark/blob/master/fluid/mnist.py, I found conv_2d with cuDNN is slow. It turns out the element_wise_add
element_wise_add_grad
bias takes about 80% of the time while the actual cudnn_conv only takes 10%.
@dzhwinter This looks pretty bad to me. Please confirm if you've seen the similar results during your benchmarking? I am using nvprof
and NVIDIA Visual Profiler.
I am wondering if we can do one of the following.
- See if can improve the
elementwise_add
, but I am not sure how hard it is in Eigen. @reyoung - Combine conv and add bias into one operator, like CudnnConvBaseLayer.cpp in v2.
We need to improve this due to the importance of this operator in vision task.
Issue related: https://github.com/PaddlePaddle/Paddle/issues/7862.