1. 18 10月, 2021 5 次提交
  2. 17 10月, 2021 2 次提交
  3. 16 10月, 2021 1 次提交
  4. 15 10月, 2021 11 次提交
  5. 14 10月, 2021 18 次提交
  6. 13 10月, 2021 3 次提交
    • G
      fix BatchNorm for fp16 (#36376) · 8fd1b6ad
      Guoxia Wang 提交于
      * fix BatchNorm for fp16
      8fd1b6ad
    • Y
      [PaddlePaddle hackathon] + ADD CELU (#36088) · d7064f04
      yujun 提交于
      * update
      
      * update
      
      * update
      
      * try make CI pass
      
      * doc typo
      
      * update doc string
      d7064f04
    • L
      Merge lars op (#35476) · 0c31579c
      limingshu 提交于
      * A leap of try for cudaLaunchCooperativeKernel
      
      * fix bugs
      
      * Totally replace the lar cuda kernel
      
      * Fix bugs
      
      * a test for lars merge
      
      * Adding las_op_momentum infer_shape
      
      * Fix codes
      
      * use avg_numel instead of max_numel to acquire grid num
      
      * modify unittest files about lars op
      
      * Finally converge when merged-lars works
      
      * fix ctest files
      
      * add merged_operation kernel when cuda version is older than 11
      
      * Fix code style
      
      * fix ctest failure
      
      * fix error
      
      * fix all ctest error and change lars compute code of cpu
      
      * fix bugs on v100.
      
      * revert python modififation about lars
      
      * revert python modification codes
      0c31579c