1. 11 2月, 2022 5 次提交
  2. 10 2月, 2022 7 次提交
  3. 09 2月, 2022 13 次提交
  4. 08 2月, 2022 7 次提交
  5. 07 2月, 2022 2 次提交
  6. 06 2月, 2022 1 次提交
  7. 04 2月, 2022 1 次提交
  8. 02 2月, 2022 1 次提交
  9. 30 1月, 2022 1 次提交
  10. 29 1月, 2022 2 次提交
    • L
      Optimize layer norm backward cuda kernel when cols is 1024. (#39247) · 99cfcc09
      Li Min 提交于
      * Add fp16 support for scale/bias for fused_layernnorm_residual_dropout_bias op.
      
      * Remove useless code.
      
      * Remove useless code.
      
      * Optimize layer_norm fwd when cols is 1024.
      
      * Remove useless code.
      
      * Minors.
      
      * Minors.
      
      * Modifications accordding to reviews.
      
      * Minors.
      
      * Optimize layer_norm bwd kernel when cols is 1024.
      
      * Polish layer_norm_bwd_1024 kernel.
      
      * Limit ln_bwd_1024_kernel to paddle_with_cuda.
      
      * Fix double type compile error.
      
      * Add optimization of ln bwd for fused_dropout_add_ln op.
      
      * Polish codes.
      99cfcc09
    • C
      [PTen] Tidy pten core headers (#39188) · dd990981
      Chen Weihang 提交于
      * open header for custom kernel
      
      * add core utils
      
      * tidy core code
      
      * tify header
      
      * tidy include
      
      * tidy namespace
      
      * resolve conflit
      
      * fix unittest and coverage
      
      * remove platform using
      
      * resolve conflict
      
      * resolve conflict
      
      * fix digamma namespace error
      
      * fix xpu full kernel error
      
      * fix xpu full kernel error
      
      * polish details
      
      * add place for lib storage
      dd990981