提交 · c0a991c8740b413559bfc894aa5ae1d5ed3704b5 · BaiXuePrincess / Paddle

30 11月, 2020 2 次提交
- W
  
  optimizer amp, all use fp16 communication, overlap last comm and compute (#28957) · 0c2a51d2
  由 WangXi 提交于 11月 30, 2020
  
  0c2a51d2
- 1
  test=develop, rm pathlib (#28658) · 92817f80
  由 123malin 提交于 11月 30, 2020
```
* test=develop, rm pathlib
```
  92817f80
27 11月, 2020 3 次提交

Support dynamic graph distributed (#28997) · e2d01eb6

由 ShenLiang 提交于 11月 27, 2020

* add reducer

* refine envent for memorycopy

* add concat&split for allreduce

* apply concat & split for fuse tensor

* fix nccl dep

* fix the untest, compile problem and ddp initialize problem

* fix untest for mac & add some comments & solve the repeated param in sublayers

* fix untest for windows & fix document

e2d01eb6

C

fix some docs test=develop;test=document_fix (#29159) · d576d6dd
由 Chen Long 提交于 11月 27, 2020

d576d6dd
L

Add a flag to control whether to initialize gloo (#29150) · a1add716
由 lilong12 提交于 11月 27, 2020

a1add716

26 11月, 2020 5 次提交

S
fix InMemoryDataset doc (#28688) · cddc7096
由 ShenLiang 提交于 11月 26, 2020
```
* add Inmemorydataset
```
cddc7096

[sharding] doc, api, bug fixed (#28983) · 0dadacc4

由 JZ-LIANG 提交于 11月 26, 2020

* add lars to fleet meta optimizer

* add lamb to proto

* add lamb to fleet meta optimizer

* fixed syntax bug

* fixed syntax bug

* fixed syntax error in lamb, add config setter of lamb in distributed_strategy

* trigger unitest to rerun

* add new unitest func for lamb

* revise unitest for lars and lamb

* revise dgc meta unitest

* revise lars document in distribute_strategy

* revise lars lamb document in distributed_strategy.py

* revise lars lamb document in distributed_strategy.py

* add weight decay exclude logic to lars

* restore optimzier.py

* restore optimizer.py as develop except lars

* add epsilon and exclude fn to distributed_sttrategy

* add lars epsilon

* revise unitest for fleet lars and lamb

* revise lars lamb unitest for CI coverage

* revise lars argument api

* revise lars argument api

* revise lars argument api

* revise api doc of lars

* fix op role

* add sharding save and add_sync_comm_for_test function

* add comm_analyse to utlis

* revise sharding_utils

* add sharding saving unittest

* revise sharding utils for unittest

* revise sharding en doc

* update sharding utils api

* add doc for sharding

* fixed bug in sharding var size count

* update varsize count in sharding

* fix sharding num_nccl_comm

* Revert "fix sharding num_nccl_comm"

This reverts commit d51587c15e9323acf226ddd36154275f0d1daf76.

0dadacc4

L
fix the bug in gloo (#29112) · 2a864c70
由 lilong12 提交于 11月 26, 2020
```
* update, test=develop
```
2a864c70
W

Fix multi nccl comm & wait server ready (#28663) · e931c7ba
由 WangXi 提交于 11月 26, 2020

e931c7ba
G

Clean up the redundant files and unify the launch interface. (#28928) · 1358397e
由 gongweibao 提交于 11月 26, 2020

1358397e

24 11月, 2020 2 次提交
- L
  Upgrade string literals to raw string (#28989) · 3815d7aa
  由 Leo Chen 提交于 11月 24, 2020
```
* upgrade comment string to raw string

* fix string in

* fix string with ' '

* revert update on comments

* upgrade only necessary

* fix sample code checker

* fix comments with '''
```
  3815d7aa
- 1
  【paddle.distributed.fleet】Optimize ParameterServer's Async Mode (#28442) · fbf9564f
  由 123malin 提交于 11月 24, 2020
```
* test=develop, optimize global_step
```
  fbf9564f
23 11月, 2020 1 次提交
- L
  enable pipeline to run with Executor.run() (#28373) · f77a78cd
  由 lilong12 提交于 11月 23, 2020
```
* update, test=develop
```
  f77a78cd
18 11月, 2020 1 次提交

[Sharding] add new features (#28568) · 5a9f6889

由 JZ-LIANG 提交于 11月 18, 2020

* add lars to fleet meta optimizer

* add lamb to proto

* add lamb to fleet meta optimizer

* fixed syntax bug

* fixed syntax bug

* fixed syntax error in lamb, add config setter of lamb in distributed_strategy

* trigger unitest to rerun

* add new unitest func for lamb

* revise unitest for lars and lamb

* revise dgc meta unitest

* revise lars document in distribute_strategy

* revise lars lamb document in distributed_strategy.py

* revise lars lamb document in distributed_strategy.py

* add weight decay exclude logic to lars

* restore optimzier.py

* restore optimizer.py as develop except lars

* add epsilon and exclude fn to distributed_sttrategy

* add lars epsilon

* revise unitest for fleet lars and lamb

* revise lars lamb unitest for CI coverage

* revise lars argument api

* revise lars argument api

* revise lars argument api

* revise api doc of lars

* fix op role

* add sharding save and add_sync_comm_for_test function

* add comm_analyse to utlis

* revise sharding_utils

* add sharding saving unittest

* revise sharding utils for unittest

5a9f6889

28 10月, 2020 1 次提交
- C
  【Paddle.Fleet】Fix fleetrun heter (#28252) · 4dc8c44b
  由 Chengmo 提交于 10月 28, 2020
```
* fix fleetrun heter ps on paddlecloud
```
  4dc8c44b
26 10月, 2020 1 次提交
- M
  add sharding strategy in fleet(#27900) · 81244fbf
  由 mapingshuo 提交于 10月 26, 2020
```
* add sharding
```
  81244fbf
22 10月, 2020 1 次提交
- W
  
  refine auto strategy, test=document_fix (#28211) · 11acbfae
  由 WangXi 提交于 10月 22, 2020
  
  11acbfae
19 10月, 2020 1 次提交
- M
  fleet support paddle.optimzier (#28026) · 55098b97
  由 MRXLT 提交于 10月 19, 2020
```
fleet support paddle.optimzier

* bug fix

* fix fleet_base

* bug fix

* fix coverage
```
  55098b97
16 10月, 2020 2 次提交
- W
  
  【paddle.fleet】fleet add _get_applied_meta_list and _get_applied_graph_list (#27952) · fb641c91
  由 WangXi 提交于 10月 16, 2020
  
  fb641c91
- L
  put gloo initialization log to file (#27969) · ff0ebefc
  由 lilong12 提交于 10月 16, 2020
```
* update, test=develop
```
  ff0ebefc
15 10月, 2020 3 次提交
- T
  Feature/large scale kv save base/delta (#27470) · 202bfab1
  由 tangwei12 提交于 10月 15, 2020
```
* add size method for large scale

* add large scale UT

* add ut for checkpoint
```
  202bfab1
- 1
  【paddle.fleet】geo send sparse optimize (#27719) · aa3b4ed7
  由 123malin 提交于 10月 15, 2020
```
* test=develop, fix geo sgd communicator

* test=develop, gloo_init_method

* test=develop, bug fix for gloo http_init
```
  aa3b4ed7
- D
  【paddle.fleet】raise error when using multi-cards in fleet non_distributed mode (#27854) · 8d7908f3
  由 danleifeng 提交于 10月 15, 2020
```
* raise error if use multi-cards in fleet non_distributed mode; test=develop
```
  8d7908f3
14 10月, 2020 3 次提交
- C
  【paddle.fleet】fix sparse load (#27680) · 328cb289
  由 Chengmo 提交于 10月 14, 2020
```
* add sparse tensor load method
```
  328cb289
- 1
  【paddle.fleet】bug fix for parameter_recv (#27838) · a4f85074
  由 123malin 提交于 10月 14, 2020
```
* test=develop, bug fix for parameter_recv
* test=develop, for unittest, test_fleet_rolemaker_new
```
  a4f85074
- C
  
  remove scale loss and coll grads, test=document_fix (#27874) · ed31dac6
  由 Chen Weihang 提交于 10月 14, 2020
  
  ed31dac6
13 10月, 2020 3 次提交

W

use floyd algorithm to find meta optimizer max path, test=develop (#27867) · 50619cd8
由 WangXi 提交于 10月 13, 2020

50619cd8
M
support gradient merge with recompute, test=develop (#27834) · 8d2cb14f
由 mapingshuo 提交于 10月 13, 2020
```
* support gradient merge with recompute, test=develop
test=develop
```
8d2cb14f

【paddle.fleet】Update fleetrun & ps-heter (#27472) · c5f2802d

由 Chengmo 提交于 10月 13, 2020

* refine fleetrun.ps_launch

* update fleet run for multi device support

* ps_graph support ps-gpu

* fix heter save

* add heter save unittest

* fix unittest & simple code

* update fleetrun

* fix fleetrun

* fix launch barrier

* fix role maker

* add paddlecloud rolemaker unittest

* rename heter_worker_device_guard

c5f2802d

12 10月, 2020 1 次提交
- W
  
  fleet combine amp dgc recompute meta optimizer (#27643) · 0a1862d1
  由 WangXi 提交于 10月 12, 2020
  
  0a1862d1
30 9月, 2020 2 次提交
- D
  【paddle.fleet】fleet support non_distributed training in dygraph mode (#27714) · a01bc6b3
  由 danleifeng 提交于 9月 30, 2020
```
* fleet support non_distributed training in dygraph mode; test=develop
```
  a01bc6b3
- L
  [bug fix] avoiding multiple initialization of gloo for fleet in dygraph mode (#27706) · 742cbe66
  由 lilong12 提交于 9月 30, 2020
```
* add double grad for expand, test=develop
```
  742cbe66
29 9月, 2020 2 次提交
- L
  
  terminate http server used by gloo for fleet after init (#27698) · 5132f512
  由 lilong12 提交于 9月 29, 2020
  
  5132f512
- L
  Initialize gloo for low level collective apis (#27672) · bbc2add7
  由 lilong12 提交于 9月 29, 2020
```
* add gloo initializer, test=develop
```
  bbc2add7
28 9月, 2020 6 次提交
- Q
  Fix bugs in hdfs download (#27344) · 1539a238
  由 Qinghe JING 提交于 9月 28, 2020
```
* set default value to strategy in distributed_optimizer test=develop
```
  1539a238
- Y
  
  【paddle.distributed.fleet】add data_generator in distributed.fleet.dataset (#27345) · 78014059
  由 yaoxuefeng 提交于 9月 28, 2020
  
  78014059
- L
  
  Revert "Initialize gloo for low level collective apis (#27356)", test=document_fix (#27665) · 36c04102
  由 lilong12 提交于 9月 28, 2020
  
  36c04102
- 1
  test=develop, rm netifaces (#27581) · 68223077
  由 123malin 提交于 9月 28, 2020
```
* test=develop, rm netifaces
```
  68223077
- L
  Initialize gloo for low level collective apis (#27356) · fa73e4a2
  由 lilong12 提交于 9月 28, 2020
```
* add gloo initializer, test=develop
```
  fa73e4a2
- D
  Get final strategy (#27602) · 4e8f18ab
  由 Dong Daxiang 提交于 9月 28, 2020
```
* add get final strategy for user to print final strategy
```
  4e8f18ab

BaiXuePrincess / Paddle 与 Fork 源项目一致

BaiXuePrincess / Paddle
与 Fork 源项目一致