1. 29 1月, 2022 1 次提交
    • J
      Auto parallel/qkv fuse (#39080) · fdedf909
      JZ-LIANG 提交于
      * support qkv fuse
      
      * support qkv fuse
      
      * update completion
      
      * update completion
      
      * update dist_split
      
      * rerun ci
      
      * is_auto_compatible added
      
      * is_auto_compatible added
      fdedf909
  2. 27 1月, 2022 2 次提交
  3. 25 1月, 2022 1 次提交
  4. 21 1月, 2022 1 次提交
    • Y
      [Auto Parallel] Use the new completion algorithm (#39086) · e5cda6fa
      Yulong Ao 提交于
      * Add the backward support for QR
      
      * Remove unnecessary comments
      
      * [Auto Parallel] Improve the dist op interface and compatible computation
      
      * Remove unnecessary modification
      
      * Recover some modifications
      
      * Add lost files
      
      * Fix a minor bug
      
      * Fix the bug of the planner
      
      * Fix the format problem
      
      * [Auto Parallel] Update the completion algorithm
      
      * Fix the bug of auto_searcher unittest
      e5cda6fa
  5. 20 1月, 2022 1 次提交
  6. 18 1月, 2022 1 次提交
  7. 13 1月, 2022 1 次提交
  8. 12 1月, 2022 1 次提交
    • J
      [Dist Pass] Amp Pass (#38764) · cc24427e
      JZ-LIANG 提交于
      * auto parallel sharding base
      
      * chmod
      
      * add unitest
      
      * set unitest cmake dist label
      
      * revise code according to rewiew
      
      * chmod
      
      * bugfix for grad_clip and param broadcast
      
      * chmod
      
      * update unitest
      
      * chmod
      
      * add clip
      
      * chmod
      
      * add amp pass
      
      * chmod
      
      * add unitest
      
      * remove grad update
      
      * fixed bug
      
      * fixed bug
      
      * fixed typose
      
      * fixed typoes
      cc24427e
  9. 11 1月, 2022 1 次提交
  10. 06 1月, 2022 1 次提交
  11. 31 12月, 2021 1 次提交
  12. 30 12月, 2021 1 次提交
  13. 29 12月, 2021 1 次提交
  14. 24 12月, 2021 1 次提交
  15. 17 12月, 2021 1 次提交
  16. 14 12月, 2021 1 次提交
  17. 12 12月, 2021 1 次提交
    • 沉潜的鱼儿's avatar
      Dist op compatible (#37994) · 89bced5e
      沉潜的鱼儿 提交于
      * dist matmul op compatible
      
      * dist op unittest
      
      * modify dist matmul
      
      * modify dist reshape
      
      * modify dist reshape
      
      * add a space
      
      * add a space
      
      * delete dist matmul op
      
      * modify reshape
      
      * add dist op unittest
      
      * modify dist op unittest
      89bced5e
  18. 10 12月, 2021 1 次提交
  19. 08 12月, 2021 1 次提交
  20. 07 12月, 2021 1 次提交
    • Y
      [Auto para] Relaunch with auto mapping function (#37326) · 506e79d1
      Yulong Ao 提交于
      * [Auto Parallel]  Add the unified cluster representation
      
      * [Auto Parallel] Add the graph class for physical mapping
      
      * [Auto Parallel] Add the simple physical mapper
      
      * Set the timeout of the mapper
      
      * Merge the upstream develop unittests cmake files
      
      * Fix a bug of the process group
      
      * Remove mapper unittest from platforms which is not GPU
      
      * Move the instantiation of process group after resharding
      
      * Add the local id for devices
      
      * Update the rank mapping format
      
      * [Auto Parallel] Relaunch with the rank mapping file
      
      * Remove the unnecessary json file
      
      * Avoid entering get_device_proc_info for auto mapping
      
      * Correct the mapper unit test
      
      * Add some comments
      
      * Remove the related files about mapping
      
      * Update the unittest for auto mapping
      
      * Remove unused rank_mapping unittest
      
      * Improve the unittest coverage
      
      * Improve the unittest coverage
      
      * Improve the unittest of relaunch
      
      * Fix the unittest problem in CI
      
      * Improve the unittest of relaunch
      
      * Remove unnecessary statements
      
      * Update the unittest cmakefile
      
      * Correct the cmakefile of auto parallel unittests
      
      * Modify codes based on the new elastic change
      
      * Use the GPUs exclusively in the unittest
      
      * Correct the cmakefile
      
      * Set the timeout of the unittest
      506e79d1
  21. 30 11月, 2021 1 次提交
    • Y
      [Auto Parallel] Do the physical mapping between the process graph and the cluster graph (#37094) · b0dff05d
      Yulong Ao 提交于
      * [Auto Parallel]  Add the unified cluster representation
      
      * [Auto Parallel] Add the graph class for physical mapping
      
      * [Auto Parallel] Add the simple physical mapper
      
      * Set the timeout of the mapper
      
      * Merge the upstream develop unittests cmake files
      
      * Fix a bug of the process group
      
      * Remove mapper unittest from platforms which is not GPU
      
      * Move the instantiation of process group after resharding
      
      * Add the local id for devices
      
      * Update the rank mapping format
      
      * Add some comments
      
      * Remove the related files about mapping
      
      * Update the unittest for auto mapping
      
      * Remove unused rank_mapping unittest
      
      * Improve the unittest coverage
      
      * Improve the unittest coverage
      b0dff05d
  22. 27 11月, 2021 1 次提交
    • Y
      [Auto Parallel] Add the graph class for the process and cluster (#37482) · 48faf638
      Yulong Ao 提交于
      * [Auto Parallel]  Add the unified cluster representation
      
      * [Auto Parallel] Add the graph class for physical mapping
      
      * [Auto Parallel] Add the simple physical mapper
      
      * Set the timeout of the mapper
      
      * Merge the upstream develop unittests cmake files
      
      * Fix a bug of the process group
      
      * Remove mapper unittest from platforms which is not GPU
      
      * Move the instantiation of process group after resharding
      
      * Add the local id for devices
      
      * Update the rank mapping format
      
      * Add some comments
      
      * Remove the related files about mapping
      
      * Remove unused rank_mapping unittest
      
      * Improve the unittest coverage
      48faf638
  23. 24 11月, 2021 2 次提交
  24. 22 11月, 2021 1 次提交
  25. 12 11月, 2021 1 次提交
    • Z
      [AutoParallel] Add AutoConvert (#36958) · 1773afd7
      zhaoyingli 提交于
      * add AutoConvert
      
      * add unitest
      
      * amend merge&slice
      
      * amend default dist_attr
      
      * update doc&improve coverage
      
      * add interface dist_context
      
      * tiny modify
      1773afd7
  26. 02 11月, 2021 1 次提交
    • Z
      [AutoParallel] Save&Load Module (#36558) · b9defb4f
      zhaoyingli 提交于
      * AutoParallel Save&Load
      
      * tiny modi
      
      * update func name
      
      * tiny fix
      
      * add NotImplementedError
      
      * fix doc
      
      * update func name
      
      * update func param
      
      * update interface
      
      * add unitest & modi make_data_unshard
      
      * update unittest
      
      * update unittest
      
      * fix unittest
      
      * fix cmakelist
      
      * update unittest
      b9defb4f
  27. 29 10月, 2021 1 次提交
    • Y
      [Auto Parallel] Improve the interface and the underlying mechanisms (#36617) · a02532b5
      Yulong Ao 提交于
      * default dist op
      
      * add dist_attr for dist op
      
      * add unitest
      
      * update inputname
      
      * update function name
      
      * add unitest
      
      * update CMakeLists.txt for CI
      
      * fix dis_matmul
      
      * fix compile error
      
      * update matmul to matmul_v2
      
      * unify api
      
      * unify api
      
      * todo
      
      * update distop forward func
      
      * update distop forward func
      
      * auto parallel backward
      
      * update dist op
      
      * autoparallel backward
      
      * add backward for embedding
      
      * temp1
      
      * temp2
      
      * temp3
      
      * temp4
      
      * backward done1
      
      * backward done2
      
      * backward done3
      
      * dist embedding remove mp mode
      
      * dist matmul remove mp mode
      
      * update dist embedding
      『
      
      * dist op init1
      
      * dist op init 2
      
      * update unitest
      
      * context remove parallel mode
      
      * partitioner remove parallel mode
      
      * update unitest
      
      * a more general method to support varying mesh in pipeline parallel
      
      * support varying mesh in pipeline parallel
      
      * embedding support varying mesh in pipeline parallel
      
      * matmul support varying mesh in pipeline parallel
      
      * default dist op support varying mesh in pipeline parallel
      
      * dist attribute for startup program
      
      * default dist op support varying mesh in pipeline parallel 2
      
      * partitoner support varying mesh in pipeline parallel
      
      * revise logic for auto compeletion
      
      * revise framework.py
      
      * revise reshard unitest
      
      * revise unitest for parallelize
      
      * chmod
      
      * fixed bug for dist embedding name mapping
      
      * Improve the interface and the underlying mechanisms of auto parallel
      
      * revise completion for backward
      
      * revise completion for update
      
      * revise completion for update
      
      * update unitest
      
      * chmod
      
      * bugfix for grad_op output var's mesh
      
      * Modify codes for pr 36744
      
      * Remove unnecessary comments in framework.py
      
      * Remove unnecessary comments in completion.py
      Co-authored-by: NJZ-LIANG <jianzhongliang10@gmail.com>
      Co-authored-by: Nzhaoyingli <zhaoyingli@baidu.com>
      Co-authored-by: NJZ-LIANG <38102074+JZ-LIANG@users.noreply.github.com>
      a02532b5
  28. 27 10月, 2021 1 次提交
  29. 20 10月, 2021 1 次提交
    • J
      [Auto Parallel] Generalization for Partition and Completion (#35735) · 797bd40d
      JZ-LIANG 提交于
      * default dist op
      
      * add dist_attr for dist op
      
      * add unitest
      
      * update inputname
      
      * update function name
      
      * add unitest
      
      * update CMakeLists.txt for CI
      
      * fix dis_matmul
      
      * fix compile error
      
      * update matmul to matmul_v2
      
      * unify api
      
      * unify api
      
      * todo
      
      * update distop forward func
      
      * update distop forward func
      
      * auto parallel backward
      
      * update dist op
      
      * autoparallel backward
      
      * add backward for embedding
      
      * temp1
      
      * temp2
      
      * temp3
      
      * temp4
      
      * backward done1
      
      * backward done2
      
      * backward done3
      
      * dist embedding remove mp mode
      
      * dist matmul remove mp mode
      
      * update dist embedding
      『
      
      * dist op init1
      
      * dist op init 2
      
      * update unitest
      
      * context remove parallel mode
      
      * partitioner remove parallel mode
      
      * update unitest
      
      * a more general method to support varying mesh in pipeline parallel
      
      * support varying mesh in pipeline parallel
      
      * embedding support varying mesh in pipeline parallel
      
      * matmul support varying mesh in pipeline parallel
      
      * default dist op support varying mesh in pipeline parallel
      
      * dist attribute for startup program
      
      * default dist op support varying mesh in pipeline parallel 2
      
      * partitoner support varying mesh in pipeline parallel
      
      * revise logic for auto compeletion
      
      * revise framework.py
      
      * revise reshard unitest
      
      * revise unitest for parallelize
      
      * chmod
      
      * fixed bug for dist embedding name mapping
      Co-authored-by: Nzhaoyingli <zhaoyingli@baidu.com>
      797bd40d
  30. 19 10月, 2021 1 次提交
    • Y
      Add auto parallel cost model and unittests (#36363) · a573a7ed
      YipZLF 提交于
      * Add auto parallel cost model and unittests
      
      * Fixed code styles.
      
      * Fixed bugs and codes style
      
      * fixed typo
      
      * Improved code style: object encapsulation.
      
      * Fixed codes.
      
      * Refractored estimate_cost
      
      * Fixed typo
      a573a7ed
  31. 13 10月, 2021 2 次提交
  32. 11 10月, 2021 1 次提交
    • C
      add reshard module (#35779) · c38b0488
      caozhou 提交于
      * add reshard module
      
      * fix conflict
      
      * update reshard module
      
      * update and add unitest
      
      * update reshard module and unitest
      
      * add more unitests
      c38b0488
  33. 16 9月, 2021 1 次提交
  34. 15 9月, 2021 1 次提交
    • Z
      add dist_attr for dist op and var (#35585) · fc5fb2a1
      zhaoyingli 提交于
      * add dist_attr for dist op
      
      * add unitest
      
      * update inputname
      
      * update function name
      
      * add unitest
      
      * update CMakeLists.txt for CI
      
      * fix dis_matmul
      
      * fix compile error
      
      * update matmul to matmul_v2
      fc5fb2a1
  35. 08 9月, 2021 2 次提交
    • Y
      [Auto Parallel] Integrate all modules (#35483) · 12155358
      Yulong Ao 提交于
      * add auto_parallel dir
      
      * mv to paddle.distributed
      
      * add shard_xx api
      
      * add distributed attrs for var
      
      * add ut, test=develop
      
      * add dist
      
      * update
      
      * update
      
      * update
      
      * update
      
      * update
      
      * update, test=develop
      
      * update, test=develop
      
      * update, test=develop
      
      * update, test=develop
      
      * update, test=develop
      
      * update, test=develop
      
      * update, test=develop
      
      * update
      
      * update
      
      * update
      
      * update
      
      * update
      
      * update, test=develop
      
      * update, test=develop
      
      * update
      
      * update
      
      * delete unused proto
      
      * resotre op_desc
      
      * restore type_defs
      
      * update var_desc
      
      * remove dimss_mapping for proto_pybind
      
      * update interface.py
      
      * update framework.py
      
      * update
      
      * update
      
      * add auto_parallel dir
      
      * mv to paddle.distributed
      
      * add shard_xx api
      
      * add distributed attrs for var
      
      * add ut, test=develop
      
      * [WIP] Add the auto completion feature and related codes
      
      * [WIP] Improve the auto completion and related codes
      
      * [WIP] Make the auto completion to support data-parallel
      
      * [WIP] Make the completion support mp and dp+mp
      
      * [WIP] Refactor auto completion unit test for MLP
      
      * [WIP] Refactor the implementation of DistributedOperatorImpl
      
      * [WIP] Improve dims_mapping update rule and fix a bug
      
      * [WIP] Support auto completion for one transformer decoder layer
      
      * [WIP] Add a minor change
      
      * [WIP] Fix a bug within the uint test
      
      * Shard XShape tensor, add embedding completion and refactor code
      
      * Add the distributed_operators dir to setup.py.in
      
      * Improve the completion process and add the unittest for gpt
      
      * fix process_mesh ut
      
      * fix process_mesh ut
      
      * update
      
      * update, test=develop
      
      * Add support for automatically completing distributed attrs of special ops
      
      * update
      
      * update
      
      * update
      
      * fix doc sample codes, test=develop
      
      * improve coverage, test=develop
      
      * add static_mode check, test=develop
      
      * Model the cluster for cost model and physical mapping
      
      * update, test=develop
      
      * add set_placement, test=develop
      
      * Add the check to make sure the candidate tensors' size is great than zero
      
      * update doc, test=develop
      
      * update doc, test=develop
      
      * update doc, test=develop
      
      * update doc, test=develop
      
      * update, test=develop
      
      * Auto mark dist attrs annotated by user
      
      * update ndarray to nested list, test=develop
      
      * update, test=develop
      
      * Add auto-completion module for auto-parallel (based on PR#33804)
      
      * Remove unnecessary files
      
      * Remove unrelated files for the auto completion pr
      
      * Update the unit test to improve the coverage
      
      * Modify codes based on reviews
      
      * Minor changes for CI
      
      * Improve some codes based on new comments
      
      * Fix bugs caused by shallow copy in attributes.py
      * Imporve amend_distributed_attr_for_program in context.py
      * Other changes for weihang's comments
      
      * support shard reader
      
      * support shard reader
      
      * add parallel mode
      
      * update process mesh
      
      * add method to compute comm_group
      
      * implement dist_embedding forward func
      
      * implement dist matmul forward func
      
      * implement dist reshape forward func
      
      * add transpiler framework
      
      * add transpiler forward
      
      * implement transpiler forward
      
      * implement transpiler backward & update
      
      * add process
      
      * add unitest
      
      * chmod
      
      * chmod
      
      * chmod
      
      * update unitest
      
      * add unitest for gpt
      
      * remove unused print
      
      * rename transpiler --> partitioner
      
      * rename transpiler --> partitioner
      
      * chmod
      
      * chmod
      
      * bug fixed
      
      * remove amp function
      
      * update case for dp mode
      
      * update case for dp mode
      
      * [Auto Parallel] Integrate all parts with the newest code
      
      * Integrate all parts of auto parallel and improve codes
      
      * Integrate all parts by AutoParallelizer
      * Add unit test for AutoParallelizer
      * Improve auto completion module for pipeline parallel
      * Add support for matmul_v2 in dist_matmul
      * Correct the typo "stratergy" to "strategy"
      
      * Modify distributed_strategy.proto to conform the main stream
      
      * Restore parts of distributed_strategy to conform the develop branch
      Co-authored-by: Nsandyhouse <lilong12@baidu.com>
      Co-authored-by: NJZ-LIANG <jianzhongliang10@gmail.com>
      12155358
    • L
      add checkers for auto parallel apis (#35486) · 39540b0e
      lilong12 提交于
      * update, test=develop
      39540b0e
  36. 02 9月, 2021 1 次提交
    • J
      [Auto Parallel] Logical Partition & Dist Op (#35117) · a622b701
      JZ-LIANG 提交于
      * support shard reader
      
      * support shard reader
      
      * add parallel mode
      
      * update process mesh
      
      * add method to compute comm_group
      
      * implement dist_embedding forward func
      
      * implement dist matmul forward func
      
      * implement dist reshape forward func
      
      * add transpiler framework
      
      * add transpiler forward
      
      * implement transpiler forward
      
      * implement transpiler backward & update
      
      * add process
      
      * add unitest
      
      * chmod
      
      * chmod
      
      * chmod
      
      * update unitest
      
      * add unitest for gpt
      
      * remove unused print
      
      * rename transpiler --> partitioner
      
      * rename transpiler --> partitioner
      
      * chmod
      
      * chmod
      
      * bug fixed
      
      * remove amp function
      
      * update case for dp mode
      
      * update case for dp mode
      a622b701