“c3fcc151ecd7b3bd37266e8e20fc76861142189c”上不存在“develop/doc_cn/howto/cluster/cmd_argument_cn.html”
- 11 1月, 2021 1 次提交
 - 
- 
由 WangXi 提交于
* Optimization grad merge performance (#29784) * [fleet] combine amp and gradient merge, test=develop (#30086) * fix assign_op_xpu concat_op_xpu warining (#30120) Co-authored-by: Nliuyuhui <liuyuhui@baidu.com> 
 - 
 - 22 12月, 2020 1 次提交
 - 
- 
由 WangXi 提交于
* gen nccl id use socket (#29431) * fix gen_nccl_id_op_helper compile failed, test=develop (#29614)
 
 - 
 - 23 11月, 2020 1 次提交
 - 
- 
由 lilong12 提交于
* update, test=develop
 
 - 
 - 29 9月, 2020 2 次提交
 - 
- 
由 Chen Weihang 提交于
* remove data parallel scale loss & apply collective_grads * move apply in minimize * fix failed unittests
 - 
由 lilong12 提交于
* add gloo initializer, test=develop
 
 - 
 - 28 9月, 2020 2 次提交
 - 04 9月, 2020 1 次提交
 - 
- 
由 danleifeng 提交于
paddle.distributed.fleet supports dynamic graph execution.
 
 - 
 - 28 8月, 2020 1 次提交
 - 
- 
由 Chen Weihang 提交于
* add dygraph parallel run interface * polish implement & unified env property name * add print config arg * refactor init_parallel_env function * Compatible with multiprocessing and launch modes * set default trainer start port * support run in python 2 * polish python2 support code * remove python2 support * refine launch import * polish dome design details * refactor api implemention & path * use new method _set_expected_place * add spawn unittest framework & mnist test * add more unittests & doc * fix unittest failed * polish english doc * self review and polish details * refactor code by reviewer's comments * fix unittest failed * fix parallel_env unittest * fix several typos * fix error introduced when fixing typos * add unpublic note for start_processes * polish details by xiaoguang's comment * verify correctly when spawn nprocs=-1 * refactor spawn & init_parallel_env design * polish doc details * open spawn unittests * try to fix doc compile error * try to fix unknown doc format error * add skip unittest when not gpu
 
 - 
 - 08 7月, 2020 1 次提交
 - 
 - 02 7月, 2020 1 次提交
 - 
- 
由 tangwei12 提交于
* disable distributed UT temporary,enable it soon, test=develop
 
 - 
 - 10 3月, 2020 1 次提交
 - 
- 
由 WangXi 提交于
 
 - 
 - 31 12月, 2019 1 次提交
 - 
- 
由 WangXi 提交于
 
 - 
 - 13 12月, 2019 1 次提交
 - 
- 
由 WangXi 提交于
 
 - 
 - 13 11月, 2019 1 次提交
 - 
- 
由 gongweibao 提交于
use 2 cards test=develop
 
 - 
 - 12 11月, 2019 1 次提交
 - 
- 
由 lilong12 提交于
modify the implementation of save_persistables and save_inference_model for fleet collective mode (#20802) * modify the implementation of save_persistables and save_inference_model functions for fleet collective, test=develop * add ut, test=develop
 
 - 
 - 22 10月, 2019 2 次提交
 - 
- 
由 gongweibao 提交于
 - 
由 gongweibao 提交于
 
 - 
 - 18 10月, 2019 2 次提交
 - 
- 
由 WangXi 提交于
 - 
由 gongweibao 提交于
 
 - 
 - 16 10月, 2019 1 次提交
 - 
- 
由 gongweibao 提交于
 
 - 
 - 15 10月, 2019 1 次提交
 - 
- 
由 WangXi 提交于
 
 - 
 - 14 10月, 2019 1 次提交
 - 
- 
由 gongweibao 提交于
Add detail logs on resnet unit test
 
 - 
 - 09 10月, 2019 1 次提交
 - 
- 
由 gongweibao 提交于
 
 - 
 - 27 9月, 2019 1 次提交
 - 
- 
由 tangwei12 提交于
* add a base class for the Communicator * add AsyncCommunicator Impl for async distributed training
 
 - 
 - 28 8月, 2019 1 次提交
 - 
- 
由 Yi Liu 提交于
test=develop
 
 - 
 - 22 8月, 2019 1 次提交
 - 
- 
由 chengduo 提交于
* update parallel.py test=develop
 
 - 
 - 19 8月, 2019 1 次提交
 - 
- 
由 kh2se2013 提交于
add python coverage launch when WITH_COVERAGE=ON
 
 - 
 - 12 8月, 2019 1 次提交
 - 
- 
由 gongweibao 提交于
Polish fleet API to support cuda collective mode and nccl2 mode
 
 - 
 - 10 8月, 2019 1 次提交
 - 
- 
由 Zeng Jinle 提交于
* deprecate python memory optimize, test=develop * remove memory_optimize in unittests, test=develop * add unittests to deprecated interfaces, test=develop
 
 - 
 - 09 8月, 2019 1 次提交
 - 
- 
由 chengduo 提交于
* Enhance fuse optimization op pass test=develop
 
 - 
 - 11 7月, 2019 1 次提交
 - 
- 
由 gongweibao 提交于
 
 - 
 - 21 6月, 2019 1 次提交
 - 
- 
由 guru4elephant 提交于
* add more print function for timeout issue, make timeout value larger
 
 - 
 - 16 6月, 2019 1 次提交
 - 
- 
由 guru4elephant 提交于
* add class name and timeline for test_dist_base.py
 
 - 
 - 14 6月, 2019 2 次提交
 - 
- 
由 guru4elephant 提交于
* add print log for unittest of distributed training test=develop
 - 
由 gongweibao 提交于
 
 - 
 - 06 6月, 2019 1 次提交
 - 
- 
由 gongweibao 提交于
 
 - 
 - 27 5月, 2019 1 次提交
 - 
- 
由 gongweibao 提交于
 
 - 
 - 17 5月, 2019 1 次提交
 - 
- 
由 Yan Xu 提交于
* add var grad hook test=develop
 
 - 
 - 25 4月, 2019 1 次提交
 - 
- 
由 Yan Xu 提交于
implement dygraph.parallel.DataParallel to hook reduce op.
 
 -