1. 25 8月, 2022 5 次提交
    • H
      optimize conv algo cache (#41891) · 1cd7e68b
      hong 提交于
      * optimizer conv alog speed
      
      * code polish
      
      * remove useless code
      
      * fix compile error
      
      * fix cpu compile error
      
      * not use cudnn alog t
      
      * add search cache max number
      
      * polish code
      
      * fix cache test bug
      
      * add groups data format to conv args
      
      * fix cache test bug
      
      * fix cudnn_deterministic bug
      
      * fix test switch auto tune bug
      
      * fix test swith autotune bug;
      
      * fix conv cache bug
      
      * fix cache test error
      
      * fix cache test bug
      
      * fix windows mac compile error
      
      * fix workspace search error
      
      * update cudnn cache
      
      * fix cache test bug; test=develop
      
      * fix autotune swith test error
      
      * polish code
      
      * oplish code
      1cd7e68b
    • R
      [triu_indices] add triu_indices_op (#45168) · a410c397
      Rayman 提交于
      a410c397
    • S
      Fix unique_kernel bugs (#45032) · ea1f4702
      sprouteer 提交于
      * fix unique_kernel bugs
      
      * fix unique kernel cu bugs
      ea1f4702
    • H
      Fix relu python call (#45082) · 839fac65
      hong 提交于
      * add python final state
      
      * fix bug
      
      * fix bugs
      
      * fix bug
      
      * fix bug
      
      * revert impl, final state mul not support selected rows
      
      * fix softmax use cudnn error
      
      * add softlable false unitest
      
      * revert loss.py
      839fac65
    • H
      add temporal shift and grad *test=kunlun (#45300) · 63d9a175
      haosicheng 提交于
      63d9a175
  2. 24 8月, 2022 8 次提交
  3. 23 8月, 2022 8 次提交
  4. 22 8月, 2022 4 次提交
  5. 19 8月, 2022 4 次提交
  6. 18 8月, 2022 6 次提交
  7. 17 8月, 2022 4 次提交
  8. 16 8月, 2022 1 次提交
    • C
      [Phi] Move amp ops into phi (#45079) · b4f67757
      Chen Weihang 提交于
      * move check finite and unscale kernel into phi
      
      * move infershape into phi
      
      * move update_loss_scaling kernel into phi
      
      * remove original kernels
      
      * move update loss scaling infershape into phi
      
      * add header for xpu and npu
      
      * solve coverage failed
      
      * fix npu test failed
      
      * remove mutable data in cu file
      
      * fix new executor failed
      
      * add valid check for meta tensor output
      b4f67757