• Y
    [Auto Parallel] Integrate all modules (#35483) · 12155358
    Yulong Ao 提交于
    * add auto_parallel dir
    
    * mv to paddle.distributed
    
    * add shard_xx api
    
    * add distributed attrs for var
    
    * add ut, test=develop
    
    * add dist
    
    * update
    
    * update
    
    * update
    
    * update
    
    * update
    
    * update, test=develop
    
    * update, test=develop
    
    * update, test=develop
    
    * update, test=develop
    
    * update, test=develop
    
    * update, test=develop
    
    * update, test=develop
    
    * update
    
    * update
    
    * update
    
    * update
    
    * update
    
    * update, test=develop
    
    * update, test=develop
    
    * update
    
    * update
    
    * delete unused proto
    
    * resotre op_desc
    
    * restore type_defs
    
    * update var_desc
    
    * remove dimss_mapping for proto_pybind
    
    * update interface.py
    
    * update framework.py
    
    * update
    
    * update
    
    * add auto_parallel dir
    
    * mv to paddle.distributed
    
    * add shard_xx api
    
    * add distributed attrs for var
    
    * add ut, test=develop
    
    * [WIP] Add the auto completion feature and related codes
    
    * [WIP] Improve the auto completion and related codes
    
    * [WIP] Make the auto completion to support data-parallel
    
    * [WIP] Make the completion support mp and dp+mp
    
    * [WIP] Refactor auto completion unit test for MLP
    
    * [WIP] Refactor the implementation of DistributedOperatorImpl
    
    * [WIP] Improve dims_mapping update rule and fix a bug
    
    * [WIP] Support auto completion for one transformer decoder layer
    
    * [WIP] Add a minor change
    
    * [WIP] Fix a bug within the uint test
    
    * Shard XShape tensor, add embedding completion and refactor code
    
    * Add the distributed_operators dir to setup.py.in
    
    * Improve the completion process and add the unittest for gpt
    
    * fix process_mesh ut
    
    * fix process_mesh ut
    
    * update
    
    * update, test=develop
    
    * Add support for automatically completing distributed attrs of special ops
    
    * update
    
    * update
    
    * update
    
    * fix doc sample codes, test=develop
    
    * improve coverage, test=develop
    
    * add static_mode check, test=develop
    
    * Model the cluster for cost model and physical mapping
    
    * update, test=develop
    
    * add set_placement, test=develop
    
    * Add the check to make sure the candidate tensors' size is great than zero
    
    * update doc, test=develop
    
    * update doc, test=develop
    
    * update doc, test=develop
    
    * update doc, test=develop
    
    * update, test=develop
    
    * Auto mark dist attrs annotated by user
    
    * update ndarray to nested list, test=develop
    
    * update, test=develop
    
    * Add auto-completion module for auto-parallel (based on PR#33804)
    
    * Remove unnecessary files
    
    * Remove unrelated files for the auto completion pr
    
    * Update the unit test to improve the coverage
    
    * Modify codes based on reviews
    
    * Minor changes for CI
    
    * Improve some codes based on new comments
    
    * Fix bugs caused by shallow copy in attributes.py
    * Imporve amend_distributed_attr_for_program in context.py
    * Other changes for weihang's comments
    
    * support shard reader
    
    * support shard reader
    
    * add parallel mode
    
    * update process mesh
    
    * add method to compute comm_group
    
    * implement dist_embedding forward func
    
    * implement dist matmul forward func
    
    * implement dist reshape forward func
    
    * add transpiler framework
    
    * add transpiler forward
    
    * implement transpiler forward
    
    * implement transpiler backward & update
    
    * add process
    
    * add unitest
    
    * chmod
    
    * chmod
    
    * chmod
    
    * update unitest
    
    * add unitest for gpt
    
    * remove unused print
    
    * rename transpiler --> partitioner
    
    * rename transpiler --> partitioner
    
    * chmod
    
    * chmod
    
    * bug fixed
    
    * remove amp function
    
    * update case for dp mode
    
    * update case for dp mode
    
    * [Auto Parallel] Integrate all parts with the newest code
    
    * Integrate all parts of auto parallel and improve codes
    
    * Integrate all parts by AutoParallelizer
    * Add unit test for AutoParallelizer
    * Improve auto completion module for pipeline parallel
    * Add support for matmul_v2 in dist_matmul
    * Correct the typo "stratergy" to "strategy"
    
    * Modify distributed_strategy.proto to conform the main stream
    
    * Restore parts of distributed_strategy to conform the develop branch
    Co-authored-by: Nsandyhouse <lilong12@baidu.com>
    Co-authored-by: NJZ-LIANG <jianzhongliang10@gmail.com>
    12155358
partitioner.py 40.6 KB