提交 · 5b77b259d82790dc5c4ca01f9688b353c3dc27fc · BaiXuePrincess / Paddle

21 1月, 2021 3 次提交
- G
  Pass device_ids info from launch to trainer. (#30632) · f5aca8fb
  由 gongweibao 提交于 1月 21, 2021
```
Pass device_ids info from launch to trainer
```
  f5aca8fb
- V
  Build praser for Hcom* operators (#30627) · d2404da7
  由 Void Main 提交于 1月 21, 2021
```
Build praser for Hcom* operators
```
  d2404da7
- G
  Add distribution supported (#30578) · f9c97dd7
  由 gongweibao 提交于 1月 21, 2021
```
Add distribution supported
```
  f9c97dd7
15 1月, 2021 2 次提交
- H
  
  Ascend rc (#30483) · 6dd52c5b
  由 hutuxian 提交于 1月 15, 2021
  
  6dd52c5b
- 1
  test=develop, fix fleet.metric (#30438) · 05f06d9a
  由 123malin 提交于 1月 15, 2021
```
* test=develop, fix fleet.metrics(mse, rmse, mae)
```
  05f06d9a
14 1月, 2021 2 次提交
- C
  fix ps init(#30397) · 859431aa
  由 Chengmo 提交于 1月 14, 2021
```
Co-authored-by: NseiriosPlus <tangwei12@baidu.com>
```
  859431aa
- 1
  test=develop, add distributed_infer (#30300) · 2a98e932
  由 123malin 提交于 1月 14, 2021
```
* test=develop, add distributed_infer
```
  2a98e932
12 1月, 2021 3 次提交
- J
  
  Recompute Offload (#30233) · 75936d83
  由 JZ-LIANG 提交于 1月 12, 2021
  
  75936d83
- T
  Fix/distributed proto (#29981) · 25f80fd3
  由 tangwei12 提交于 1月 12, 2021
```
* rename sendrecv.proto to namespace paddle.distributed

* split ps with distributed
```
  25f80fd3
- C
  【Paddle.Fleet】Support local save sparse param (#30175) · d479ae17
  由 Chengmo 提交于 1月 12, 2021
```
* add save tensor support
Co-authored-by: NseiriosPlus <tangwei12@baidu.com>
```
  d479ae17
08 1月, 2021 2 次提交
- C
  
  remove distributed prepare context (#30219) · 3016ba85
  由 Chen Weihang 提交于 1月 08, 2021
  
  3016ba85
- C
  【Paddle.Fleet】Fix tensor table (#30075) · 528e03fc
  由 Chengmo 提交于 1月 08, 2021
```
* add tensor table
```
  528e03fc
07 1月, 2021 1 次提交
- C
  Simplify the options of spawn based on fleetrun (#30144) · 8020e34e
  由 Chen Weihang 提交于 1月 06, 2021
```
* Simplify the options of spawn based on fleetrun

* polish details

* polish doc details
```
  8020e34e
06 1月, 2021 1 次提交
- G
  
  fix logs info test=develop (#30071) · 4d2a4bb2
  由 gongweibao 提交于 1月 06, 2021
  
  4d2a4bb2
05 1月, 2021 3 次提交
- W
  
  [fleet] combine amp and gradient merge, test=develop (#30086) · ab049978
  由 WangXi 提交于 1月 05, 2021
  
  ab049978
- G
  
  fix selected_gpus test=develop (#30044) · eea7090c
  由 gongweibao 提交于 1月 05, 2021
  
  eea7090c
- C
  Set FLAGS_selected_gpus for spawn (#29962) · 46c46954
  由 Chen Weihang 提交于 1月 04, 2021
```
* set flags_selectedd_gpus for spawn

* add cond for unittest

* Delete test_no_single_process_using_multi_gpus_in_spawn.py

* Update spawn.py

* Update nccl_context.cc
```
  46c46954
31 12月, 2020 2 次提交
- L
  Disable gloo by default (#29805) · b0bd93de
  由 lilong12 提交于 12月 31, 2020
```
* update, test=develop
```
  b0bd93de
- L
  add the paddle.distributed.split api (#29970) · 2bc5121d
  由 lilong12 提交于 12月 31, 2020
```
* add distributed.split, test=develop
```
  2bc5121d
25 12月, 2020 1 次提交
- L
  fix the bug in pipeline data parallelism (#29731) · 01950ceb
  由 lilong12 提交于 12月 25, 2020
```
* update, test=develop
```
  01950ceb
24 12月, 2020 1 次提交

[Feature] one ps (3/4) (#29604) · 032414ca

由 tangwei12 提交于 12月 24, 2020

* oneps (3/4)
Co-authored-by: NMrChengmo <cmchengmo@163.com>
Co-authored-by: Nmalin10 <malin10@baidu.com>
Co-authored-by: Nchengmo <chengmo@baidu.com>

032414ca

22 12月, 2020 1 次提交
- S
  Support multi-stream communication for dynamic graph distributed (#29525) · 01e2874a
  由 ShenLiang 提交于 12月 22, 2020
```
* fix fleet for multi-stream

* fix memcpy for ncclid

* use sync to solve move operation
```
  01e2874a
17 12月, 2020 1 次提交
- W
  
  fleet sync build strategy, test=develop (#29732) · 9cbcc6ca
  由 WangXi 提交于 12月 17, 2020
  
  9cbcc6ca
11 12月, 2020 1 次提交

[Sharding] add hybrid-dp feature (#29518) · d33d468f

由 JZ-LIANG 提交于 12月 11, 2020

* Sharding add hybrid-dp feature

* update sharding in distributed_strategy

* update sharding unitest

* revise code format for sharding

d33d468f

09 12月, 2020 1 次提交
- S
  Rebuild group automatically in dynamic graph distributed (#29255) · 2ef9e0e2
  由 ShenLiang 提交于 12月 09, 2020
```
* add tensor_indices in AssignGroupBySize

* add rebuild group in reducer
```
  2ef9e0e2
08 12月, 2020 1 次提交
- L
  Fix bug in gloo that gloo initialization hangs (#29447) · b122d0bb
  由 lilong12 提交于 12月 08, 2020
```
* update, test=develop
```
  b122d0bb
04 12月, 2020 1 次提交
- S
  
  support dp run single card (#29358) · 4064354a
  由 ShenLiang 提交于 12月 04, 2020
  
  4064354a
03 12月, 2020 3 次提交
- G
  
  cleanup enum test=develop (#29294) · 96de8b00
  由 gongweibao 提交于 12月 03, 2020
  
  96de8b00
- S
  
  fix warning of fleet (#29317) · 2d6aa1a5
  由 ShenLiang 提交于 12月 03, 2020
  
  2d6aa1a5
- S
  Fix doc of fleet api (#29282) · 2cd0bf57
  由 ShenLiang 提交于 12月 03, 2020
```
* fix doc, test=document_fix
```
  2cd0bf57
01 12月, 2020 2 次提交
- S
  
  Change the api of DataParallel and Fleet (#29224) · 46b73e6c
  由 ShenLiang 提交于 12月 01, 2020
  
  46b73e6c
- 1
  test=develop, fix doc (#29200) · cc9c6196
  由 123malin 提交于 12月 01, 2020
```
* fix fleet api doc
```
  cc9c6196
30 11月, 2020 2 次提交
- W
  
  optimizer amp, all use fp16 communication, overlap last comm and compute (#28957) · 0c2a51d2
  由 WangXi 提交于 11月 30, 2020
  
  0c2a51d2
- 1
  test=develop, rm pathlib (#28658) · 92817f80
  由 123malin 提交于 11月 30, 2020
```
* test=develop, rm pathlib
```
  92817f80
27 11月, 2020 4 次提交
- S
  Support dynamic graph distributed (#28997) · e2d01eb6
  由 ShenLiang 提交于 11月 27, 2020
```
* add reducer

* refine envent for memorycopy

* add concat&split for allreduce

* apply concat & split for fuse tensor

* fix nccl dep

* fix the untest, compile problem and ddp initialize problem

* fix untest for mac & add some comments & solve the repeated param in sublayers

* fix untest for windows & fix document
```
  e2d01eb6
- C
  
  fix some docs test=develop;test=document_fix (#29159) · d576d6dd
  由 Chen Long 提交于 11月 27, 2020
  
  d576d6dd
- L
  
  update, test=develop (#29139) · 216e0856
  由 lilong12 提交于 11月 27, 2020
  
  216e0856
- L
  
  Add a flag to control whether to initialize gloo (#29150) · a1add716
  由 lilong12 提交于 11月 27, 2020
  
  a1add716
26 11月, 2020 2 次提交

S
fix InMemoryDataset doc (#28688) · cddc7096
由 ShenLiang 提交于 11月 26, 2020
```
* add Inmemorydataset
```
cddc7096

[sharding] doc, api, bug fixed (#28983) · 0dadacc4

由 JZ-LIANG 提交于 11月 26, 2020

* add lars to fleet meta optimizer

* add lamb to proto

* add lamb to fleet meta optimizer

* fixed syntax bug

* fixed syntax bug

* fixed syntax error in lamb, add config setter of lamb in distributed_strategy

* trigger unitest to rerun

* add new unitest func for lamb

* revise unitest for lars and lamb

* revise dgc meta unitest

* revise lars document in distribute_strategy

* revise lars lamb document in distributed_strategy.py

* revise lars lamb document in distributed_strategy.py

* add weight decay exclude logic to lars

* restore optimzier.py

* restore optimizer.py as develop except lars

* add epsilon and exclude fn to distributed_sttrategy

* add lars epsilon

* revise unitest for fleet lars and lamb

* revise lars lamb unitest for CI coverage

* revise lars argument api

* revise lars argument api

* revise lars argument api

* revise api doc of lars

* fix op role

* add sharding save and add_sync_comm_for_test function

* add comm_analyse to utlis

* revise sharding_utils

* add sharding saving unittest

* revise sharding utils for unittest

* revise sharding en doc

* update sharding utils api

* add doc for sharding

* fixed bug in sharding var size count

* update varsize count in sharding

* fix sharding num_nccl_comm

* Revert "fix sharding num_nccl_comm"

This reverts commit d51587c15e9323acf226ddd36154275f0d1daf76.

0dadacc4

BaiXuePrincess / Paddle 与 Fork 源项目一致

BaiXuePrincess / Paddle
与 Fork 源项目一致