“999d0fdbef0c18024c89c9a5eee309177dc4e160”上不存在“paddle/fluid/framework/ir/generate_pass.h”
* refine reduce by cub * optimize KernelDepthwiseConvFilterGrad * optimize depthwise conv and reduce mean and reduce sum * fix bug: dilation * cuda arch and cuda 8 compatible