“ba3b2eb3a5c288bd898d057a77682cecf043836c”上不存在“doc_cn/design/functions_operators_layers.html”
* refine reduce by cub * optimize KernelDepthwiseConvFilterGrad * optimize depthwise conv and reduce mean and reduce sum * fix bug: dilation * cuda arch and cuda 8 compatible test=release/1.0.0