The number of operations can be reduced to improve the performance.
Created by: chengduoZH
For fluid, nearly all the computation of operations are implemented by op_kernel, so there may be so many operations in some model. After careful analysis, some of the operations can be merged into one, which is beneficial to performance.
The above picture is the timeline of L2DecayRegularizer
for parameters, the equation is:
new_g = g + regularization_coeff * p
The current implementation is using two op_kernel, scale
and elementwise_add
, this leads to the start of the two op_kernel and two CUDA kernel.