- 04 9月, 2020 1 次提交
-
-
由 danleifeng 提交于
paddle.distributed.fleet supports dynamic graph execution.
-
- 03 9月, 2020 2 次提交
-
-
由 danleifeng 提交于
* print detailed and clear log infos; test=develop
-
由 danleifeng 提交于
* support running python train.py for fleet-task; test=develop
-
- 02 9月, 2020 2 次提交
- 31 8月, 2020 2 次提交
-
-
由 Chen Weihang 提交于
* remove backend argument of init_parallel_env * remove keep name table in transformer * add cpu version check * add skip unittest for init_parallel_env * polish doc: remove func use & update example
-
由 tangwei12 提交于
* add FleetAPI doc Co-authored-by: Nliuyuhui <liuyuhui@baidu.com>
-
- 30 8月, 2020 1 次提交
-
-
由 Chengmo 提交于
* Support Heter Parameter Server
-
- 29 8月, 2020 1 次提交
-
-
由 Dong Daxiang 提交于
* fix api document
-
- 28 8月, 2020 1 次提交
-
-
由 Chen Weihang 提交于
* add dygraph parallel run interface * polish implement & unified env property name * add print config arg * refactor init_parallel_env function * Compatible with multiprocessing and launch modes * set default trainer start port * support run in python 2 * polish python2 support code * remove python2 support * refine launch import * polish dome design details * refactor api implemention & path * use new method _set_expected_place * add spawn unittest framework & mnist test * add more unittests & doc * fix unittest failed * polish english doc * self review and polish details * refactor code by reviewer's comments * fix unittest failed * fix parallel_env unittest * fix several typos * fix error introduced when fixing typos * add unpublic note for start_processes * polish details by xiaoguang's comment * verify correctly when spawn nprocs=-1 * refactor spawn & init_parallel_env design * polish doc details * open spawn unittests * try to fix doc compile error * try to fix unknown doc format error * add skip unittest when not gpu
-
- 27 8月, 2020 2 次提交
- 26 8月, 2020 1 次提交
-
-
由 JZ-LIANG 提交于
-
- 25 8月, 2020 1 次提交
-
-
由 Dong Daxiang 提交于
* add cudnn related strategies to DistributedStrategy
-
- 24 8月, 2020 1 次提交
-
-
由 WangXi 提交于
-
- 22 8月, 2020 1 次提交
-
-
由 liuyuhui 提交于
* solve the initial configuration about fleet and rolemaker Co-authored-by: NseiriosPlus <tangwei12@baidu.com>
-
- 21 8月, 2020 2 次提交
-
-
由 Dong Daxiang 提交于
* add documentation for DistributedStrategy
-
由 Dong Daxiang 提交于
* consider the combination of different strategies to work together
-
- 20 8月, 2020 1 次提交
-
-
由 123malin 提交于
* add save/load for parameter server
-
- 18 8月, 2020 1 次提交
-
-
由 mapingshuo 提交于
* add feature to fleet2.0 role_maker, distribute_strategy, test=develop
-
- 17 8月, 2020 3 次提交
-
-
由 Dong Daxiang 提交于
* add check approval test=develop
-
由 Yi Liu 提交于
* make fleet_localsgd_meta_optimizer work * fix bug in localsgd meta optimizer
-
由 Qinghe JING 提交于
* set default value to strategy in distributed_optimizer test=develop
-
- 14 8月, 2020 1 次提交
-
-
由 vslyu 提交于
* add unittest for paddlerolemaker with gloo
-
- 13 8月, 2020 1 次提交
-
-
由 Dong Daxiang 提交于
* move paddle.fleet to paddle.distributed.fleet
-
- 27 7月, 2020 1 次提交
-
-
由 Yi Liu 提交于
test=develop
-
- 07 7月, 2020 1 次提交
-
-
由 gongweibao 提交于
-
- 02 7月, 2020 1 次提交
-
-
由 Yi Liu 提交于
* fix the compatibility of PY2 and PY3 in paddle.distributed.launch test=develop * only pull log of local rank 0 test=develop * log exception if UnicodeEncodeError occurs when pulling log in paddle.distributed.launch test=develop Co-authored-by: NSunGaofeng <peakbee@gmail.com>
-
- 30 6月, 2020 1 次提交
-
-
由 Yi Liu 提交于
test=develop
-
- 28 5月, 2020 1 次提交
-
-
由 mapingshuo 提交于
replace join to terminate
-
- 08 5月, 2020 1 次提交
-
-
由 zhangchunle 提交于
-
- 21 4月, 2020 1 次提交
-
-
由 Kaipeng Deng 提交于
* add DataLoader, Dataset, BatchSampler
-
- 09 4月, 2020 1 次提交
-
-
由 gongweibao 提交于
-
- 03 4月, 2020 1 次提交
-
-
由 gongweibao 提交于
-
- 09 3月, 2020 1 次提交
-
-
由 gongweibao 提交于
-
- 23 2月, 2020 1 次提交
-
-
由 tianshuo78520a 提交于
-
- 21 1月, 2020 1 次提交
-
-
由 gongweibao 提交于
-
- 14 1月, 2020 1 次提交
-
-
由 danleifeng 提交于
-
- 18 11月, 2019 1 次提交
-
-
由 danleifeng 提交于
-
- 02 11月, 2019 1 次提交
-
-
由 Dong Daxiang 提交于
* add launch_ps module so that we can launch a parameter server training job 1) a user can specify worker_num and server_num 2) parameter server can be killed after all workers exit 3) unit test is added test=develop
-