- 19 10月, 2022 1 次提交
-
-
由 Nyakku Shigure 提交于
-
- 14 10月, 2022 1 次提交
-
-
由 Wen Sun 提交于
-
- 13 10月, 2022 1 次提交
-
-
由 Xinger 提交于
* add rpc module in cpp side * add rpc module in python side * support win32 and mac for rpc * 代码优化 * 优化代码 * update rpc * update rpc launch * rpc remove rank and world_size api * fix logger import bug * remove support for win and mac * remove support for xpu, npu, cinn and rocm * remove support for xpu, npu, cinn and rocm * fix shutdown barrier timeout bug * update:python_rpc_handler to shared ptr * fix master shutodwn first bug * tests support for cpu * update log to vlog * update get service info api * add single process test case * remove process group * remove some useless dependencies * update rpc api comments * update rpc comments: Example to Examples * update rpc api comments * update rpc api comments * update launch api comments * update init_rpc comments * update rpc sync and async comments * fix bug: init_rpc cant be called repeatly in a process * update rpc api comment: make master endpoint unique * update rpc api:service to worker, timeout_ms to timeout * rename ServiceInfo to WorkerInfo * refactor: rename server to worker, log to vlog * add launch test * remove unused codes * refine
-
- 20 9月, 2022 1 次提交
-
-
由 Roc 提交于
uniform logger manager in FleetAPI. hidde API under distributed/utils which users don't need.
-
- 31 8月, 2022 1 次提交
-
-
由 LiYuRio 提交于
-
- 28 7月, 2022 1 次提交
-
-
由 LiYuRio 提交于
-
- 11 7月, 2022 1 次提交
-
-
由 Haohongxiang 提交于
* fix conflict * new pg apis * add docs of new apis * update * fix coverage * update * fix bug * fix reduce scatter * fix api * update Co-authored-by: NForFishes <2282912238@qq.com>
-
- 05 6月, 2022 1 次提交
-
-
由 Sing_chan 提交于
* use yapf to format all python file * yapf exclude two unittests file for they rely on writing and reading file, and format will break them * disable diff_py_file because too many diff files cause command following failed
-
- 12 4月, 2022 1 次提交
-
-
由 Yanxing Shi 提交于
-
- 23 3月, 2022 1 次提交
-
-
由 kuizhiqing 提交于
-
- 09 3月, 2022 1 次提交
-
-
由 Baibaifan 提交于
-
- 26 11月, 2021 1 次提交
-
-
由 zhaocaibei123 提交于
* test * test * rm test * update * update * update * add unittest * update * update save
-
- 29 10月, 2021 1 次提交
-
-
由 Yulong Ao 提交于
* default dist op * add dist_attr for dist op * add unitest * update inputname * update function name * add unitest * update CMakeLists.txt for CI * fix dis_matmul * fix compile error * update matmul to matmul_v2 * unify api * unify api * todo * update distop forward func * update distop forward func * auto parallel backward * update dist op * autoparallel backward * add backward for embedding * temp1 * temp2 * temp3 * temp4 * backward done1 * backward done2 * backward done3 * dist embedding remove mp mode * dist matmul remove mp mode * update dist embedding 『 * dist op init1 * dist op init 2 * update unitest * context remove parallel mode * partitioner remove parallel mode * update unitest * a more general method to support varying mesh in pipeline parallel * support varying mesh in pipeline parallel * embedding support varying mesh in pipeline parallel * matmul support varying mesh in pipeline parallel * default dist op support varying mesh in pipeline parallel * dist attribute for startup program * default dist op support varying mesh in pipeline parallel 2 * partitoner support varying mesh in pipeline parallel * revise logic for auto compeletion * revise framework.py * revise reshard unitest * revise unitest for parallelize * chmod * fixed bug for dist embedding name mapping * Improve the interface and the underlying mechanisms of auto parallel * revise completion for backward * revise completion for update * revise completion for update * update unitest * chmod * bugfix for grad_op output var's mesh * Modify codes for pr 36744 * Remove unnecessary comments in framework.py * Remove unnecessary comments in completion.py Co-authored-by: NJZ-LIANG <jianzhongliang10@gmail.com> Co-authored-by: Nzhaoyingli <zhaoyingli@baidu.com> Co-authored-by: NJZ-LIANG <38102074+JZ-LIANG@users.noreply.github.com>
-
- 18 9月, 2021 1 次提交
-
-
由 Guoxia Wang 提交于
* fix bug
-
- 17 9月, 2021 1 次提交
-
-
由 Guoxia Wang 提交于
* add launch doc
-
- 08 9月, 2021 1 次提交
-
-
由 lilong12 提交于
* update, test=develop
-
- 24 8月, 2021 1 次提交
-
-
由 Yulong Ao 提交于
* add auto_parallel dir * mv to paddle.distributed * add shard_xx api * add distributed attrs for var * add ut, test=develop * add dist * update * update * update * update * update * update, test=develop * update, test=develop * update, test=develop * update, test=develop * update, test=develop * update, test=develop * update, test=develop * update * update * update * update * update * update, test=develop * update, test=develop * update * update * delete unused proto * resotre op_desc * restore type_defs * update var_desc * remove dimss_mapping for proto_pybind * update interface.py * update framework.py * update * update * add auto_parallel dir * mv to paddle.distributed * add shard_xx api * add distributed attrs for var * add ut, test=develop * [WIP] Add the auto completion feature and related codes * [WIP] Improve the auto completion and related codes * [WIP] Make the auto completion to support data-parallel * [WIP] Make the completion support mp and dp+mp * [WIP] Refactor auto completion unit test for MLP * [WIP] Refactor the implementation of DistributedOperatorImpl * [WIP] Improve dims_mapping update rule and fix a bug * [WIP] Support auto completion for one transformer decoder layer * [WIP] Add a minor change * [WIP] Fix a bug within the uint test * Shard XShape tensor, add embedding completion and refactor code * Add the distributed_operators dir to setup.py.in * Improve the completion process and add the unittest for gpt * fix process_mesh ut * fix process_mesh ut * update * update, test=develop * Add support for automatically completing distributed attrs of special ops * update * update * update * fix doc sample codes, test=develop * improve coverage, test=develop * add static_mode check, test=develop * Model the cluster for cost model and physical mapping * update, test=develop * add set_placement, test=develop * Add the check to make sure the candidate tensors' size is great than zero * update doc, test=develop * update doc, test=develop * update doc, test=develop * update doc, test=develop * update, test=develop * Auto mark dist attrs annotated by user * update ndarray to nested list, test=develop * update, test=develop * Add auto-completion module for auto-parallel (based on PR#33804) * Remove unnecessary files * Remove unrelated files for the auto completion pr * Update the unit test to improve the coverage * Modify codes based on reviews * Minor changes for CI * Improve some codes based on new comments * Fix bugs caused by shallow copy in attributes.py * Imporve amend_distributed_attr_for_program in context.py * Other changes for weihang's comments Co-authored-by: Nsandyhouse <lilong12@baidu.com>
-
- 23 8月, 2021 1 次提交
-
-
由 Bo Liu 提交于
-
- 11 8月, 2021 1 次提交
-
-
由 lilong12 提交于
* add auto_parallel apis
-
- 06 5月, 2021 1 次提交
-
-
由 zhiboniu 提交于
-
- 24 2月, 2021 1 次提交
-
-
由 tangwei12 提交于
* fix entry * fix distributed lookup table fuse case * fix entry bug at first time * move entry from paddle.fluid -> paddle.distributed * fix ut with paddle.enable_static() Co-authored-by: Nmalin10 <malin10@baidu.com>
-
- 08 1月, 2021 1 次提交
-
-
由 Chen Weihang 提交于
-
- 28 9月, 2020 1 次提交
-
-
由 yaoxuefeng 提交于
-
- 16 9月, 2020 1 次提交
-
-
由 yaoxuefeng 提交于
-
- 29 8月, 2020 1 次提交
-
-
由 Dong Daxiang 提交于
* fix api document
-
- 28 8月, 2020 1 次提交
-
-
由 Chen Weihang 提交于
* add dygraph parallel run interface * polish implement & unified env property name * add print config arg * refactor init_parallel_env function * Compatible with multiprocessing and launch modes * set default trainer start port * support run in python 2 * polish python2 support code * remove python2 support * refine launch import * polish dome design details * refactor api implemention & path * use new method _set_expected_place * add spawn unittest framework & mnist test * add more unittests & doc * fix unittest failed * polish english doc * self review and polish details * refactor code by reviewer's comments * fix unittest failed * fix parallel_env unittest * fix several typos * fix error introduced when fixing typos * add unpublic note for start_processes * polish details by xiaoguang's comment * verify correctly when spawn nprocs=-1 * refactor spawn & init_parallel_env design * polish doc details * open spawn unittests * try to fix doc compile error * try to fix unknown doc format error * add skip unittest when not gpu
-
- 27 8月, 2020 1 次提交
-
-
由 lilong12 提交于
add collective op for cpu using gloo and paddle.distributed.* apis
-
- 07 7月, 2020 1 次提交
-
-
由 gongweibao 提交于
-
- 08 5月, 2020 1 次提交
-
-
由 zhangchunle 提交于
-
- 12 2月, 2019 1 次提交
-
-
由 Yan Xu 提交于
* add launch mp distributed mode module test=develop * delete unused file test=develop * refine usage test=develop * refine usage test=develop * move distributed package test=develop * add to whl package test=develop
-
- 24 1月, 2019 1 次提交
-
-
由 WangZhen 提交于
-
- 24 12月, 2018 1 次提交
-
-
由 whs 提交于
* Init slim. * Remove distillation demo. * Fix import errors. test=develop * Fix some issues. test=develop * Fix configs. test=develop * Modify API.spec. test=develop * Fix format. test=develop * Fix format. test=develop * Add some comments.
-
- 02 7月, 2018 1 次提交
-
-
由 Xin Pan 提交于
-
- 09 12月, 2016 1 次提交
-
-
由 Yi Wang 提交于
-
- 12 11月, 2016 1 次提交
-
-
由 qijun 提交于
-
- 29 8月, 2016 1 次提交
-
-
由 zhangjinchao01 提交于
ISSUE=4586495 git-svn-id: https://svn.baidu.com/idl/trunk/paddle@1408 1ad973e4-5ce8-4261-8a94-b56d1f490c56
-