• L
    Optimize layer norm backward cuda kernel when cols is 1024. (#39247) · 99cfcc09
    Li Min 提交于
    * Add fp16 support for scale/bias for fused_layernnorm_residual_dropout_bias op.
    
    * Remove useless code.
    
    * Remove useless code.
    
    * Optimize layer_norm fwd when cols is 1024.
    
    * Remove useless code.
    
    * Minors.
    
    * Minors.
    
    * Modifications accordding to reviews.
    
    * Minors.
    
    * Optimize layer_norm bwd kernel when cols is 1024.
    
    * Polish layer_norm_bwd_1024 kernel.
    
    * Limit ln_bwd_1024_kernel to paddle_with_cuda.
    
    * Fix double type compile error.
    
    * Add optimization of ln bwd for fused_dropout_add_ln op.
    
    * Polish codes.
    99cfcc09
layer_norm_kernel.cu.h 55.6 KB