提交 · 40e6c57bed6c8363bb5695284c2cdc1bb61a42fc · PaddlePaddle / Paddle

19 2月, 2021 1 次提交
- S
  
  Remove scale loss before reduce in dygraph (#30807) · 9401173e
  由 ShenLiang 提交于 2月 19, 2021
  
  9401173e
05 2月, 2021 1 次提交
- L
  
  [Kunlun] add gen_bkcl_id_op, support multi XPU cards training using multiprocess (#30858) · 4a8b8b45
  由 liuyuhui 提交于 2月 05, 2021
  
  4a8b8b45
03 2月, 2021 1 次提交
- W
  
  【kunlun】dygraph supports multi xpu card training (#30671) · b1026f64
  由 WangXi 提交于 2月 03, 2021
  
  b1026f64
04 1月, 2021 1 次提交
- W
  
  Optimization grad merge performance (#29784) · ee16006b
  由 WangXi 提交于 1月 04, 2021
  
  ee16006b
14 12月, 2020 1 次提交
- W
  
  gen nccl id use socket (#29431) · 467c7169
  由 WangXi 提交于 12月 14, 2020
  
  467c7169
23 11月, 2020 1 次提交
- L
  enable pipeline to run with Executor.run() (#28373) · f77a78cd
  由 lilong12 提交于 11月 23, 2020
```
* update, test=develop
```
  f77a78cd
29 9月, 2020 2 次提交
- C
  Remove DataParallel.scale_loss & apply_collective_grads (#27603) · dec53a9c
  由 Chen Weihang 提交于 9月 29, 2020
```
* remove data parallel scale loss & apply collective_grads

* move apply in minimize

* fix failed unittests
```
  dec53a9c
- L
  Initialize gloo for low level collective apis (#27672) · bbc2add7
  由 lilong12 提交于 9月 29, 2020
```
* add gloo initializer, test=develop
```
  bbc2add7
28 9月, 2020 2 次提交
- L
  
  Revert "Initialize gloo for low level collective apis (#27356)", test=document_fix (#27665) · 36c04102
  由 lilong12 提交于 9月 28, 2020
  
  36c04102
- L
  Initialize gloo for low level collective apis (#27356) · fa73e4a2
  由 lilong12 提交于 9月 28, 2020
```
* add gloo initializer, test=develop
```
  fa73e4a2
04 9月, 2020 1 次提交
- D
  【paddle.fleet】distributed_optimizer supports dygraph (#26541) · 6b4ca0d7
  由 danleifeng 提交于 9月 04, 2020
```
paddle.distributed.fleet supports dynamic graph execution.
```
  6b4ca0d7
28 8月, 2020 1 次提交

Add interface to launch parallel dygraph by multiprocessing (#26044) · 31f422ae

由 Chen Weihang 提交于 8月 28, 2020

* add dygraph parallel run interface

* polish implement & unified env property name

* add print config arg

* refactor init_parallel_env function

* Compatible with multiprocessing and launch modes

* set default trainer start port

* support run in python 2

* polish python2 support code

* remove python2 support

* refine launch import

* polish dome design details

* refactor api implemention & path

* use new method _set_expected_place

* add spawn unittest framework & mnist test

* add more unittests & doc

* fix unittest failed

* polish english doc

* self review and polish details

* refactor code by reviewer's comments

* fix unittest failed

* fix parallel_env unittest

* fix several typos

* fix error introduced when fixing typos

* add unpublic note for start_processes

* polish details by xiaoguang's comment

* verify correctly when spawn nprocs=-1

* refactor spawn & init_parallel_env design

* polish doc details

* open spawn unittests

* try to fix doc compile error

* try to fix unknown doc format error

* add skip unittest when not gpu

31f422ae

08 7月, 2020 1 次提交

Revert/barrier for sync (#25417) · 4b3778a3

由 tangwei12 提交于 7月 08, 2020

* add retry for prefetch

* Revert "Fix/sync barrier (#25016)"

This reverts commit be6a315f.

* reopen dist UT, test=develop

* remove fl UT, test=develop

4b3778a3

02 7月, 2020 1 次提交
- T
  disable distributed UT temporary (#25300) · 9825a9f3
  由 tangwei12 提交于 7月 02, 2020
```
* disable distributed UT temporary，enable it soon, test=develop
```
  9825a9f3
10 3月, 2020 1 次提交
- W
  
  Close fuse when use dgc & move DGC strategy from PE to compiler, test=develop (#22914) · 8d47162e
  由 WangXi 提交于 3月 10, 2020
  
  8d47162e
31 12月, 2019 1 次提交
- W
  
  fix sync_batch_norm hang in fleet (#21838) · 3ec289a6
  由 WangXi 提交于 12月 31, 2019
  
  3ec289a6
13 12月, 2019 1 次提交
- W
  
  Tmp fix fleet bug in py35 gcc8 CI, test=develop (#21703) · a2175cfc
  由 WangXi 提交于 12月 13, 2019
  
  a2175cfc
13 11月, 2019 1 次提交
- G
  Use 2 cards for hallreduce unit test. (#21085) · a5fc291f
  由 gongweibao 提交于 11月 13, 2019
```
use 2 cards test=develop
```
  a5fc291f
12 11月, 2019 1 次提交

modify the implementation of save_persistables and save_inference_model for... · 53148e06

由 lilong12 提交于 11月 12, 2019

modify the implementation of save_persistables and save_inference_model for fleet collective mode (#20802)

* modify the implementation of  save_persistables and save_inference_model functions for fleet collective, test=develop

* add ut, test=develop

53148e06

22 10月, 2019 2 次提交
- G
  
  Wait pserver to complete initialization. (#20777) · e4251240
  由 gongweibao 提交于 10月 22, 2019
  
  e4251240
- G
  
  Set unique port to every distribute test to avoid potential port conflicts (#20759) · 8088395a
  由 gongweibao 提交于 10月 22, 2019
  
  8088395a
18 10月, 2019 2 次提交
- W
  
  Fix dgc nan by stripping nccl from sparseReduce. (#20630) · 507afa8a
  由 WangXi 提交于 10月 17, 2019
  
  507afa8a
- G
  
  Disable GRPC_ARG_ALLOW_REUSEPORT to avoid potencial problem. (#20690) · c1710e91
  由 gongweibao 提交于 10月 18, 2019
  
  c1710e91
16 10月, 2019 1 次提交
- G
  
  Retry when failed to bind address. (#20642) · f3f52fc1
  由 gongweibao 提交于 10月 16, 2019
  
  f3f52fc1
15 10月, 2019 1 次提交
- W
  
  fix dgc test and bug when not set trainers_endpoints_, test=develop (#20617) · cadc6a97
  由 WangXi 提交于 10月 14, 2019
  
  cadc6a97
14 10月, 2019 1 次提交
- G
  Add detail logs on resnet unit test (#20558) · bf6470c7
  由 gongweibao 提交于 10月 14, 2019
```
 Add detail logs on resnet unit test
```
  bf6470c7
09 10月, 2019 1 次提交
- G
  
  Add bash_test_modules function to capture the timeout or failed context. (#20197) · 89c4b3dd
  由 gongweibao 提交于 10月 09, 2019
  
  89c4b3dd
27 9月, 2019 1 次提交

the integrated communicator (#19849) · 8f0b3c05

由 tangwei12 提交于 9月 27, 2019

* add a base class for the Communicator
* add AsyncCommunicator Impl for async distributed training

8f0b3c05

28 8月, 2019 1 次提交
- Y
  adapte fleet api for localsgd and support nccl comm configuration in executor (#19443) · 4ef6b845
  由 Yi Liu 提交于 8月 28, 2019
```
test=develop
```
  4ef6b845
22 8月, 2019 1 次提交
- C
  [Speedup] Make dygraph data parallel faster (#19280) · 5a579df9
  由 chengduo 提交于 8月 22, 2019
```
* update parallel.py
test=develop
```
  5a579df9
19 8月, 2019 1 次提交
- K
  add python coverage launch when WITH_COVERAGE=ON (#19264) · 27e85625
  由 kh2se2013 提交于 8月 19, 2019
```
add python coverage launch when WITH_COVERAGE=ON
```
  27e85625
12 8月, 2019 1 次提交
- G
  Polish fleet API to support cuda collective mode and nccl2 mode. (#18966) · 29d87812
  由 gongweibao 提交于 8月 12, 2019
```
Polish fleet API to support cuda collective mode and nccl2 mode
```
  29d87812
10 8月, 2019 1 次提交

Try to deprecate unstable python memory optimize (#18983) · c194b0c8

由 Zeng Jinle 提交于 8月 10, 2019

* deprecate python memory optimize, test=develop

* remove memory_optimize in unittests, test=develop

* add unittests to deprecated interfaces, test=develop

c194b0c8

09 8月, 2019 1 次提交
- C
  Enhance fuse optimization op pass (#19010) · 17d62ab2
  由 chengduo 提交于 8月 09, 2019
```
* Enhance fuse optimization op pass
test=develop
```
  17d62ab2
11 7月, 2019 1 次提交
- G
  
  Polish backwards optimizer dependency codes and use more default values. (#18255) · c0a82748
  由 gongweibao 提交于 7月 11, 2019
  
  c0a82748
21 6月, 2019 1 次提交
- G
  add more print function for timeout issue, make timeout value larger (#18219) · 7d76e34e
  由 guru4elephant 提交于 6月 21, 2019
```
* add more print function for timeout issue, make timeout value larger
```
  7d76e34e
16 6月, 2019 1 次提交
- G
  add class name and timeline for test_dist_base.py (#18122) · 0941e3e0
  由 guru4elephant 提交于 6月 16, 2019
```
* add class name and timeline for test_dist_base.py
```
  0941e3e0
14 6月, 2019 2 次提交
- G
  Refine unittest log (#18084) · b2cfdc38
  由 guru4elephant 提交于 6月 14, 2019
```
* add print log for unittest of distributed training
test=develop
```
  b2cfdc38
- G
  
  Fix reinitialized ncclid error! (#18025) · f5caf344
  由 gongweibao 提交于 6月 14, 2019
  
  f5caf344
06 6月, 2019 1 次提交
- G
  
  Add backward and optimizer operator dependency pass. (#17746) · fbbdc9cc
  由 gongweibao 提交于 6月 06, 2019
  
  fbbdc9cc

PaddlePaddle / Paddle 大约 1 年 前同步成功

PaddlePaddle / Paddle
大约 1 年前同步成功