1. 10 5月, 2023 1 次提交
    • B
      [cherry-pick 2.5] Broadcast && Dropout_nd Performance Optimization into Release/2.5 (#53623) · f9ea2301
      Bo Zhang 提交于
      * Support different dtypes of inputs for broadcast for dropout optimization  (#52093)
      
      * change judgement for DropoutGradGPUKernelDriver
      
      * add UnrollerWithoutVecSize and after this Loaddata to be refined
      
      * pass unittest
      
      * use same unroller with XPU
      
      * BroadcastWithInt64Index
      
      * BroadcastDataLoader template partial specialization
      
      * fix compile errs in ROCms
      
      * PR comment
      
      * dropout_nd_optimization (#51479)
      
      * with printf
      
      * add DropOutNdForwardKernel
      
      * PR comment
      
      * Dropout optimize & clean broadcast inT and ElementwiseType (#52969)
      
      * change judgement for DropoutGradGPUKernelDriver
      
      * add UnrollerWithoutVecSize and after this Loaddata to be refined
      
      * pass unittest
      
      * use same unroller with XPU
      
      * BroadcastWithInt64Index
      
      * BroadcastDataLoader template partial specialization
      
      * fix compile errs in ROCms
      
      * clean ElementwiseT and InT for BroadcastKernel
      
      * default axis and clean inT
      
      * remove redundant fast divmod computation
      
      * optimize drop_nd & drop_nd_grad
      
      * optimize BroadcastDataLoader bf16 fp16
      
      * rm InT etc. after merge develop
      
      * delete constexpr for windows ci
      
      * fix conflict
      
      * fix conflic with develop
      
      * fix conflic
      
      * new clean
      
      * clean
      
      * Fix xpu2 kp compile error (#53548)
      
      * fix conflict
      
      * conflict
      f9ea2301
  2. 22 2月, 2023 1 次提交
  3. 14 2月, 2023 1 次提交
  4. 03 1月, 2023 1 次提交
  5. 14 12月, 2022 1 次提交
  6. 05 12月, 2022 1 次提交
  7. 28 11月, 2022 1 次提交
  8. 23 11月, 2022 1 次提交
  9. 17 11月, 2022 2 次提交
  10. 31 10月, 2022 1 次提交
  11. 19 9月, 2022 1 次提交
    • L
      Performance fix for broadcast kernel [Part3] (#46071) · 46e4fb2a
      limingshu 提交于
      * first commit
      
      * refine code with template argument
      
      * refine code with template argument
      
      * add ternary broadcast test file
      
      * add ternary broadcast test file
      
      * fix accoriding to ci
      
      * fix op-benchmark ci error
      46e4fb2a
  12. 16 9月, 2022 1 次提交
    • S
      Support broadcast elementwise operators with int64 index type (#45741) · 20b5bf84
      sneaxiy 提交于
      * support int64 non-broadcast
      
      * support broadcast case for int64 index
      
      * fix bug
      
      * support more Arity
      
      * remove some codes
      
      * upgrade patchelf to v0.15.0 to pass CI build
      
      * fix bug
      
      * fix patchelf installation
      
      * add debug flags
      
      * remove useless codes
      
      * fix viterbi_decode and set_value op uts
      
      * remove always enable int64
      20b5bf84
  13. 15 9月, 2022 1 次提交
  14. 07 9月, 2022 1 次提交
  15. 23 8月, 2022 1 次提交
  16. 06 6月, 2022 1 次提交
  17. 05 6月, 2022 1 次提交
  18. 20 5月, 2022 1 次提交
  19. 16 5月, 2022 1 次提交
  20. 12 5月, 2022 1 次提交
  21. 10 5月, 2022 1 次提交
  22. 27 4月, 2022 1 次提交
    • Z
      Optimize performance of dygraph (v4) (#42196) · 37e2f027
      zyfncg 提交于
      * optimize performance of dygraph
      
      * optimize performance of dygraph and elementwise_add
      
      * optimize the trace op
      
      * fix bug
      
      * fix bug
      
      * fix unittest bug
      
      * fix code format
      37e2f027
  23. 25 4月, 2022 1 次提交
  24. 07 3月, 2022 1 次提交
  25. 04 3月, 2022 1 次提交
  26. 02 3月, 2022 1 次提交
  27. 23 2月, 2022 1 次提交
  28. 20 2月, 2022 2 次提交