The number of operations can be reduced to improve the performance. (#9045) · Issue · PaddlePaddle / Paddle

The number of operations can be reduced to improve the performance.

Created by: chengduoZH

For fluid, nearly all the computation of operations are implemented by op_kernel, so there may be so many operations in some model. After careful analysis, some of the operations can be merged into one, which is beneficial to performance.

The above picture is the timeline of L2DecayRegularizer for parameters, the equation is:

 new_g = g + regularization_coeff * p

The current implementation is using two op_kernel, scale and elementwise_add, this leads to the start of the two op_kernel and two CUDA kernel.

PaddlePaddle / Paddle 大约 1 年 前同步成功

The number of operations can be reduced to improve the performance.

PaddlePaddle / Paddle
大约 1 年前同步成功