1. 14 6月, 2023 1 次提交
    • G
      Fix cuda12 timeout problems. (#54615) · a90d9088
      Ghost Screaming 提交于
      * Fix bug of reduce_sum op. When input.numel() > INT32_MAX, its result
      is wrong.
      
      * Remove climits.
      
      * Fix problem of pickle and NCCL_P2P_DISABLE in distributed testcases in
      cuda12.
      
      * Fix problem of TimeOut of distributed testcases under cuda12.
      
      * Remove useless modification.
      
      * Remove useless modification.
      a90d9088
  2. 13 6月, 2023 2 次提交
  3. 02 6月, 2023 1 次提交
  4. 01 6月, 2023 1 次提交
  5. 26 4月, 2023 1 次提交
  6. 17 4月, 2023 2 次提交
  7. 10 4月, 2023 1 次提交
  8. 29 3月, 2023 1 次提交
    • Y
      Add Fuse Adamw Pass (#50484) · 66098bff
      yuehuayingxueluo 提交于
      * add fuse adamw pass
      
      * fix some bugs
      
      * fix CIbug
      
      * change chunk_size
      
      * fix CI bug
      
      * rm test_fused_adam_op.py
      
      * fix CI bugs
      
      * fix fuse_adamw_op_pass.cc
      
      * change code style
      
      * fix CI bug
      
      * fix ut bug and use_adamw_op_pass.cc
      
      * fix test_fuse_adamw_pass.py
      
      * fix CI bug
      
      * remove fluid
      
      * fix ci bug
      
      * fix CI bug
      66098bff
  9. 23 3月, 2023 1 次提交
  10. 22 3月, 2023 1 次提交
  11. 16 3月, 2023 1 次提交
    • J
      [Auto Parallel Performance] Support BF16 Training (#51285) · 9ded5707
      JZ-LIANG 提交于
      * update env setting
      
      * update pass logic
      
      * dist op support bf16
      
      * backward cast update
      
      * update setting
      
      * update backward
      
      * revert amp pass
      
      * update fp16 backward logic
      
      * register c_embedding bf16
      
      * revert engine
      
      * add unitest
      
      * add unitest
      
      * update unitest
      
      * update cmake
      
      * update math
      
      * update math.py
      
      * update unitest
      
      * update unitest
      
      * revise unitest
      
      * revise unitest
      
      * update unitest
      
      * update unitest
      
      * update unitest
      9ded5707
  12. 14 3月, 2023 1 次提交
  13. 15 2月, 2023 1 次提交
    • X
      align tool (#49865) · 4632ca13
      xu98bin 提交于
      * auto parallel align tool
      
      * modify function get_var's return
      
      * add save and load in align_tool
      
      * modify load function and save function
      
      * add finding different ops in align tool
      
      * full auto parallel align tool
      
      add test file for auto parallel align tool
      
      set timeout for test
      
      modify get_backward_tmp_var function
      
      add annotation for align tool
      
      modify test file
      
      modify code to restart CI
      
      remove timeout
      
      * set timeout
      4632ca13
  14. 09 2月, 2023 1 次提交
    • Y
      Fix bugs in pass_base.py (#50136) · 5cae5fdd
      yuehuayingxueluo 提交于
      * fix the processing order of passes in pass_base.py
      
      * fix processing order
      
      * add _PASS_PROCESS_ORDER_LIST
      
      * delete some pass in _PASS_PROCESS_ORDER_LIST
      
      * add assert in pass_base.py
      
      * remove fuse_optimizer
      
      * add _fusion_opt_list_rule
      
      * add test_pass_base_list.py
      
      * fix some bug
      
      * add fused_attention
      
      * add some passes to list
      
      * fix ci bug
      
      * fix ci bug
      5cae5fdd
  15. 11 1月, 2023 1 次提交
    • Y
      add FusedLinear pass (#49606) · 0f08a432
      yuehuayingxueluo 提交于
      * add FusedLinear pass
      
      * add fused_op_list and renname PASSES to OP_FUSION
      
      * add fused_passes_list to constants.py
      
      * add test_passes.py
      
      * fix test_fused_passes.py
      
      * fix add if float(paddle.version.cuda()) >= 11.6:
      
      * renamed test_fused_passes.py
      
      * fix CMakeList.txt
      0f08a432
  16. 29 12月, 2022 1 次提交
  17. 27 12月, 2022 2 次提交
  18. 14 12月, 2022 1 次提交
    • Z
      [AutoParallel] recompute tuning (#48608) · 170a31f9
      zhaoyingli 提交于
      * [AutoParallel] recompute tuning
      
      * fix conflict
      
      * update comment
      
      * bug fix
      
      * update rc algo
      
      * tiny fix
      
      * fix clear process_group
      
      * remove comment
      
      * update segment print
      
      * fix import OpRole
      
      * adapt amp pass and grad_clip pass for opt_tuner
      
      * update tuning config
      
      * fix import
      
      * annotate recompute info on ops and upgrade recompute pass
      
      * add op_namescope for seed op
      
      * record reserved vars
      
      * fix recompute var's dist_attr
      
      * fix strategy unittest
      
      * adapt for fp16
      
      * update unittest
      
      * revert copy opt
      
      * update unittest
      
      * rename set_recompute_segments
      
      * fix unittest
      170a31f9
  19. 08 12月, 2022 1 次提交
  20. 29 11月, 2022 1 次提交
  21. 28 11月, 2022 1 次提交
  22. 22 11月, 2022 1 次提交
  23. 18 11月, 2022 1 次提交
  24. 07 11月, 2022 1 次提交
  25. 31 10月, 2022 1 次提交
  26. 28 10月, 2022 1 次提交
  27. 18 10月, 2022 2 次提交
    • C
      [Auto Parallel]Add parallel tuner (#46189) · 3108ba11
      caozhou 提交于
      * add parallel tuner
      
      * add unittest
      
      * fix unittest
      
      * set timeout of unittest
      
      * set unittest timeout
      
      * fix auto_mode setting
      
      * update unittest
      
      * sync from develop and update unittest
      
      * remove unused import
      
      * update unittest
      
      * update cmakelist
      
      * add unittests
      3108ba11
    • Z
      [AutoParallel] add callbacks (#47014) · 7c92177c
      zhaoyingli 提交于
      * [AutoParallel] add callbacks
      
      * fix unittest
      
      * fix dist_context
      
      * fix engine
      
      * fix cmakelist
      
      * fix unittest's returns
      
      * fix cmakelist
      7c92177c
  28. 14 10月, 2022 1 次提交
  29. 28 9月, 2022 1 次提交
  30. 15 9月, 2022 1 次提交
    • Y
      [Auto Parallel] Improve the APIs (#45776) · b042a3b1
      Yulong Ao 提交于
      * [Auto Parallel] Use c++ dist attr in the completion process
      
      * [Auto Parallel] Add minor changes
      
      * [Auto Parallel] Use c++ dist attr in the completion process
      
      * [Auto Parallel] Add minor changes
      
      * [Auto Parallel] Add the serialization process for dist attrs
      
      * [Auto Parallel] Remove unnecessary comments
      
      * [Auto Parallel] Fix some bugs
      
      * [Auto Parallel] Fix the code style
      
      * [Auto Parallel] Remove unnecessary impls
      
      * [Auto Parallel] Fix the importing error
      
      * [Auto Parallel] Fix the copy from bugs of op dist attr
      
      * [Auto Parallel] Replace the use of constexpr if
      
      * [Auto Parallel] Redesign the shard_tensor, shard_op and ProcessMesh
      
      * [Auto Parallel] Change API of the completion unittest
      
      * [Auto Parallel] Fix the bug when set_attr an int
      
      * [Auto Parallel] Add the unittest for the serialization
      
      * [Auto Parallel] Add some unit tests
      
      * [Auto Paralle] Unify the strategy
      
      * [Auto Parallel] Improve the engine api
      
      * [Auto Parallel] Reset the changes made to the framework
      
      * [Auto Parallel] Change the engine unittest
      
      * [Auto Parallel] Update API of the completion and partitioner
      
      * [Auto Parallel] Update unit tests using engine api
      
      * update shard annotation
      
      * [Auto Parallel] Remove the modifications of other modules
      
      * [Auto Parallel] Add docs for APIs
      
      * add new strategy
      
      * [Auto Parallel] Replace the logger
      
      * [Auto Parallel] Restore the test_program.py
      
      * [Auto Parallel] Change the import rules
      
      * [Auto Parallel] Add the examples for Engine
      
      * [Auto Parallel] Do some minor changes
      
      * [Auto Parallel] Remove yaml dependency
      
      * [Auto Parallel] Fix the unittests
      
      * add valid after train
      
      * bug fix
      Co-authored-by: Nzhaoyingli <zhaoyingli@baidu.com>
      Co-authored-by: Ncaozhou <caozhou@radi.ac.cn>
      Co-authored-by: Ncaozhou <48191911+Caozhou1995@users.noreply.github.com>
      b042a3b1
  31. 07 9月, 2022 1 次提交
  32. 05 9月, 2022 1 次提交
  33. 31 8月, 2022 2 次提交
  34. 23 8月, 2022 1 次提交
  35. 18 8月, 2022 1 次提交