- 11 1月, 2021 1 次提交
-
-
由 WangXi 提交于
* Optimization grad merge performance (#29784) * [fleet] combine amp and gradient merge, test=develop (#30086) * fix assign_op_xpu concat_op_xpu warining (#30120) Co-authored-by: Nliuyuhui <liuyuhui@baidu.com>
-
- 22 12月, 2020 1 次提交
-
-
由 WangXi 提交于
* gen nccl id use socket (#29431) * fix gen_nccl_id_op_helper compile failed, test=develop (#29614)
-
- 23 11月, 2020 1 次提交
-
-
由 lilong12 提交于
* update, test=develop
-
- 29 9月, 2020 2 次提交
-
-
由 Chen Weihang 提交于
* remove data parallel scale loss & apply collective_grads * move apply in minimize * fix failed unittests
-
由 lilong12 提交于
* add gloo initializer, test=develop
-
- 28 9月, 2020 2 次提交
- 04 9月, 2020 1 次提交
-
-
由 danleifeng 提交于
paddle.distributed.fleet supports dynamic graph execution.
-
- 28 8月, 2020 1 次提交
-
-
由 Chen Weihang 提交于
* add dygraph parallel run interface * polish implement & unified env property name * add print config arg * refactor init_parallel_env function * Compatible with multiprocessing and launch modes * set default trainer start port * support run in python 2 * polish python2 support code * remove python2 support * refine launch import * polish dome design details * refactor api implemention & path * use new method _set_expected_place * add spawn unittest framework & mnist test * add more unittests & doc * fix unittest failed * polish english doc * self review and polish details * refactor code by reviewer's comments * fix unittest failed * fix parallel_env unittest * fix several typos * fix error introduced when fixing typos * add unpublic note for start_processes * polish details by xiaoguang's comment * verify correctly when spawn nprocs=-1 * refactor spawn & init_parallel_env design * polish doc details * open spawn unittests * try to fix doc compile error * try to fix unknown doc format error * add skip unittest when not gpu
-
- 08 7月, 2020 1 次提交
-
- 02 7月, 2020 1 次提交
-
-
由 tangwei12 提交于
* disable distributed UT temporary,enable it soon, test=develop
-
- 10 3月, 2020 1 次提交
-
-
由 WangXi 提交于
-
- 31 12月, 2019 1 次提交
-
-
由 WangXi 提交于
-
- 13 12月, 2019 1 次提交
-
-
由 WangXi 提交于
-
- 13 11月, 2019 1 次提交
-
-
由 gongweibao 提交于
use 2 cards test=develop
-
- 12 11月, 2019 1 次提交
-
-
由 lilong12 提交于
modify the implementation of save_persistables and save_inference_model for fleet collective mode (#20802) * modify the implementation of save_persistables and save_inference_model functions for fleet collective, test=develop * add ut, test=develop
-
- 22 10月, 2019 2 次提交
-
-
由 gongweibao 提交于
-
由 gongweibao 提交于
-
- 18 10月, 2019 2 次提交
-
-
由 WangXi 提交于
-
由 gongweibao 提交于
-
- 16 10月, 2019 1 次提交
-
-
由 gongweibao 提交于
-
- 15 10月, 2019 1 次提交
-
-
由 WangXi 提交于
-
- 14 10月, 2019 1 次提交
-
-
由 gongweibao 提交于
Add detail logs on resnet unit test
-
- 09 10月, 2019 1 次提交
-
-
由 gongweibao 提交于
-
- 27 9月, 2019 1 次提交
-
-
由 tangwei12 提交于
* add a base class for the Communicator * add AsyncCommunicator Impl for async distributed training
-
- 28 8月, 2019 1 次提交
-
-
由 Yi Liu 提交于
test=develop
-
- 22 8月, 2019 1 次提交
-
-
由 chengduo 提交于
* update parallel.py test=develop
-
- 19 8月, 2019 1 次提交
-
-
由 kh2se2013 提交于
add python coverage launch when WITH_COVERAGE=ON
-
- 12 8月, 2019 1 次提交
-
-
由 gongweibao 提交于
Polish fleet API to support cuda collective mode and nccl2 mode
-
- 10 8月, 2019 1 次提交
-
-
由 Zeng Jinle 提交于
* deprecate python memory optimize, test=develop * remove memory_optimize in unittests, test=develop * add unittests to deprecated interfaces, test=develop
-
- 09 8月, 2019 1 次提交
-
-
由 chengduo 提交于
* Enhance fuse optimization op pass test=develop
-
- 11 7月, 2019 1 次提交
-
-
由 gongweibao 提交于
-
- 21 6月, 2019 1 次提交
-
-
由 guru4elephant 提交于
* add more print function for timeout issue, make timeout value larger
-
- 16 6月, 2019 1 次提交
-
-
由 guru4elephant 提交于
* add class name and timeline for test_dist_base.py
-
- 14 6月, 2019 2 次提交
-
-
由 guru4elephant 提交于
* add print log for unittest of distributed training test=develop
-
由 gongweibao 提交于
-
- 06 6月, 2019 1 次提交
-
-
由 gongweibao 提交于
-
- 27 5月, 2019 1 次提交
-
-
由 gongweibao 提交于
-
- 17 5月, 2019 1 次提交
-
-
由 Yan Xu 提交于
* add var grad hook test=develop
-
- 25 4月, 2019 1 次提交
-
-
由 Yan Xu 提交于
implement dygraph.parallel.DataParallel to hook reduce op.
-