Fork自 PaddlePaddle / Paddle
* add elementwise div * move mul and div grad functor * Combine multiple CUDA kernels * Update the reduce interface call * add multi-output * add multi-output div * add branch judge * Package branch * Combine the x and y functions into one