1. 09 11月, 2022 1 次提交
  2. 17 10月, 2022 1 次提交
    • W
      [Cherry-pick] Collective communication APIs (#46922) · 5fba2a98
      Wen Sun 提交于
      * Support both use_calc_stream and sync_op in send recv APIs (#46023)
      
      * Support both use_calc_stream and sync_op in allgather API (#46295)
      
      * Support both use_calc_stream and sync_op in collective communication API (#46761)
      
      * Move group and all reduce from collective to communication (#45848)
      
      * Completes bfloat16 dtype for collective api in eager mode (#45844)
      
      * Fix collective APIs cannot be recognized when building docs (#46962)
      Co-authored-by: NLiYuRio <63526175+LiYuRio@users.noreply.github.com>
      5fba2a98
  3. 22 9月, 2022 1 次提交
    • R
      logger manager (#45909) (#46087) · 7eb046c7
      Roc 提交于
      uniform logger manager in FleetAPI.
      hidde API under distributed/utils which users don't need.
      7eb046c7
  4. 31 8月, 2022 1 次提交
  5. 28 7月, 2022 1 次提交
  6. 11 7月, 2022 1 次提交
  7. 05 6月, 2022 1 次提交
    • S
      【code format check upgrade】 step2:yapf (#42944) · a072fca8
      Sing_chan 提交于
      * use yapf to format all python file
      
      * yapf exclude two unittests file for they rely on writing and reading file, and format will break them
      
      * disable diff_py_file because too many diff files cause command following failed
      a072fca8
  8. 12 4月, 2022 1 次提交
  9. 23 3月, 2022 1 次提交
  10. 09 3月, 2022 1 次提交
  11. 26 11月, 2021 1 次提交
  12. 29 10月, 2021 1 次提交
    • Y
      [Auto Parallel] Improve the interface and the underlying mechanisms (#36617) · a02532b5
      Yulong Ao 提交于
      * default dist op
      
      * add dist_attr for dist op
      
      * add unitest
      
      * update inputname
      
      * update function name
      
      * add unitest
      
      * update CMakeLists.txt for CI
      
      * fix dis_matmul
      
      * fix compile error
      
      * update matmul to matmul_v2
      
      * unify api
      
      * unify api
      
      * todo
      
      * update distop forward func
      
      * update distop forward func
      
      * auto parallel backward
      
      * update dist op
      
      * autoparallel backward
      
      * add backward for embedding
      
      * temp1
      
      * temp2
      
      * temp3
      
      * temp4
      
      * backward done1
      
      * backward done2
      
      * backward done3
      
      * dist embedding remove mp mode
      
      * dist matmul remove mp mode
      
      * update dist embedding
      『
      
      * dist op init1
      
      * dist op init 2
      
      * update unitest
      
      * context remove parallel mode
      
      * partitioner remove parallel mode
      
      * update unitest
      
      * a more general method to support varying mesh in pipeline parallel
      
      * support varying mesh in pipeline parallel
      
      * embedding support varying mesh in pipeline parallel
      
      * matmul support varying mesh in pipeline parallel
      
      * default dist op support varying mesh in pipeline parallel
      
      * dist attribute for startup program
      
      * default dist op support varying mesh in pipeline parallel 2
      
      * partitoner support varying mesh in pipeline parallel
      
      * revise logic for auto compeletion
      
      * revise framework.py
      
      * revise reshard unitest
      
      * revise unitest for parallelize
      
      * chmod
      
      * fixed bug for dist embedding name mapping
      
      * Improve the interface and the underlying mechanisms of auto parallel
      
      * revise completion for backward
      
      * revise completion for update
      
      * revise completion for update
      
      * update unitest
      
      * chmod
      
      * bugfix for grad_op output var's mesh
      
      * Modify codes for pr 36744
      
      * Remove unnecessary comments in framework.py
      
      * Remove unnecessary comments in completion.py
      Co-authored-by: NJZ-LIANG <jianzhongliang10@gmail.com>
      Co-authored-by: Nzhaoyingli <zhaoyingli@baidu.com>
      Co-authored-by: NJZ-LIANG <38102074+JZ-LIANG@users.noreply.github.com>
      a02532b5
  13. 18 9月, 2021 1 次提交
  14. 17 9月, 2021 1 次提交
  15. 08 9月, 2021 1 次提交
  16. 24 8月, 2021 1 次提交
    • Y
      Add auto completion module for auto parallel (#34813) · 93d862b0
      Yulong Ao 提交于
      * add auto_parallel dir
      
      * mv to paddle.distributed
      
      * add shard_xx api
      
      * add distributed attrs for var
      
      * add ut, test=develop
      
      * add dist
      
      * update
      
      * update
      
      * update
      
      * update
      
      * update
      
      * update, test=develop
      
      * update, test=develop
      
      * update, test=develop
      
      * update, test=develop
      
      * update, test=develop
      
      * update, test=develop
      
      * update, test=develop
      
      * update
      
      * update
      
      * update
      
      * update
      
      * update
      
      * update, test=develop
      
      * update, test=develop
      
      * update
      
      * update
      
      * delete unused proto
      
      * resotre op_desc
      
      * restore type_defs
      
      * update var_desc
      
      * remove dimss_mapping for proto_pybind
      
      * update interface.py
      
      * update framework.py
      
      * update
      
      * update
      
      * add auto_parallel dir
      
      * mv to paddle.distributed
      
      * add shard_xx api
      
      * add distributed attrs for var
      
      * add ut, test=develop
      
      * [WIP] Add the auto completion feature and related codes
      
      * [WIP] Improve the auto completion and related codes
      
      * [WIP] Make the auto completion to support data-parallel
      
      * [WIP] Make the completion support mp and dp+mp
      
      * [WIP] Refactor auto completion unit test for MLP
      
      * [WIP] Refactor the implementation of DistributedOperatorImpl
      
      * [WIP] Improve dims_mapping update rule and fix a bug
      
      * [WIP] Support auto completion for one transformer decoder layer
      
      * [WIP] Add a minor change
      
      * [WIP] Fix a bug within the uint test
      
      * Shard XShape tensor, add embedding completion and refactor code
      
      * Add the distributed_operators dir to setup.py.in
      
      * Improve the completion process and add the unittest for gpt
      
      * fix process_mesh ut
      
      * fix process_mesh ut
      
      * update
      
      * update, test=develop
      
      * Add support for automatically completing distributed attrs of special ops
      
      * update
      
      * update
      
      * update
      
      * fix doc sample codes, test=develop
      
      * improve coverage, test=develop
      
      * add static_mode check, test=develop
      
      * Model the cluster for cost model and physical mapping
      
      * update, test=develop
      
      * add set_placement, test=develop
      
      * Add the check to make sure the candidate tensors' size is great than zero
      
      * update doc, test=develop
      
      * update doc, test=develop
      
      * update doc, test=develop
      
      * update doc, test=develop
      
      * update, test=develop
      
      * Auto mark dist attrs annotated by user
      
      * update ndarray to nested list, test=develop
      
      * update, test=develop
      
      * Add auto-completion module for auto-parallel (based on PR#33804)
      
      * Remove unnecessary files
      
      * Remove unrelated files for the auto completion pr
      
      * Update the unit test to improve the coverage
      
      * Modify codes based on reviews
      
      * Minor changes for CI
      
      * Improve some codes based on new comments
      
      * Fix bugs caused by shallow copy in attributes.py
      * Imporve amend_distributed_attr_for_program in context.py
      * Other changes for weihang's comments
      Co-authored-by: Nsandyhouse <lilong12@baidu.com>
      93d862b0
  17. 23 8月, 2021 1 次提交
  18. 11 8月, 2021 1 次提交
  19. 06 5月, 2021 1 次提交
  20. 24 2月, 2021 1 次提交
    • T
      fix entry (#31079) · ebbdf525
      tangwei12 提交于
      * fix entry
      
      * fix distributed lookup table fuse case
      
      * fix entry bug at first time
      
      * move entry from paddle.fluid -> paddle.distributed
      
      * fix ut with paddle.enable_static()
      Co-authored-by: Nmalin10 <malin10@baidu.com>
      ebbdf525
  21. 08 1月, 2021 1 次提交
  22. 28 9月, 2020 1 次提交
  23. 16 9月, 2020 1 次提交
  24. 29 8月, 2020 1 次提交
  25. 28 8月, 2020 1 次提交
    • C
      Add interface to launch parallel dygraph by multiprocessing (#26044) · 31f422ae
      Chen Weihang 提交于
      * add dygraph parallel run interface
      
      * polish implement & unified env property name
      
      * add print config arg
      
      * refactor init_parallel_env function
      
      * Compatible with multiprocessing and launch modes
      
      * set default trainer start port
      
      * support run in python 2
      
      * polish python2 support code
      
      * remove python2 support
      
      * refine launch import
      
      * polish dome design details
      
      * refactor api implemention & path
      
      * use new method _set_expected_place
      
      * add spawn unittest framework & mnist test
      
      * add more unittests & doc
      
      * fix unittest failed
      
      * polish english doc
      
      * self review and polish details
      
      * refactor code by reviewer's comments
      
      * fix unittest failed
      
      * fix parallel_env unittest
      
      * fix several typos
      
      * fix error introduced when fixing typos
      
      * add unpublic note for start_processes
      
      * polish details by xiaoguang's comment
      
      * verify correctly when spawn nprocs=-1
      
      * refactor spawn & init_parallel_env design
      
      * polish doc details
      
      * open spawn unittests
      
      * try to fix doc compile error
      
      * try to fix unknown doc format error
      
      * add skip unittest when not gpu
      31f422ae
  26. 27 8月, 2020 1 次提交
  27. 07 7月, 2020 1 次提交
  28. 08 5月, 2020 1 次提交
  29. 12 2月, 2019 1 次提交
  30. 24 1月, 2019 1 次提交
  31. 24 12月, 2018 1 次提交
    • W
      Init paddle slim (#14834) · 93870574
      whs 提交于
      * Init slim.
      
      * Remove distillation demo.
      
      * Fix import errors.
      test=develop
      
      * Fix some issues.
      test=develop
      
      * Fix configs.
      test=develop
      
      * Modify API.spec.
      test=develop
      
      * Fix format.
      test=develop
      
      * Fix format.
      test=develop
      
      * Add some comments.
      93870574
  32. 02 7月, 2018 1 次提交
  33. 09 12月, 2016 1 次提交
  34. 12 11月, 2016 1 次提交
  35. 29 8月, 2016 1 次提交