提交 · 03babe17a9763b4cb1b8120cf6ea62b6e5695022 · 机器未来 / Paddle

26 2月, 2021 1 次提交
- W
  
  Fleet distributed strategy support pure fp16 (#30754) (#31238) · 03babe17
  由 WangXi 提交于 2月 26, 2021
  
  03babe17
25 2月, 2021 1 次提交

由 tangwei12 提交于 2月 25, 2021

* fix entry

* fix distributed lookup table fuse case

* fix entry bug at first time

* move entry from paddle.fluid -> paddle.distributed

* fix ut with paddle.enable_static()
Co-authored-by: Nmalin10 <malin10@baidu.com>
Co-authored-by: Nmalin10 <malin10@baidu.com>

8177ece5

23 2月, 2021 1 次提交

test=develop, save/load, shrink (#30625) (#31107) · 36710ebc

由 tangwei12 提交于 2月 23, 2021

* test=develop, save/load, shrink
Co-authored-by: NseiriosPlus <tangwei12@baidu.com>
Co-authored-by: N123malin <malin10@baidu.com>

36710ebc

20 1月, 2021 2 次提交
- Z
  [Cherry-pic]Fix the bug in fleet amp_init. (#30606) (#30608) · 09aed38d
  由 Zhen Wang 提交于 1月 20, 2021
```
* Fix the bug in fleet amp_init.

* Fix the amp_init unit test.
```
  09aed38d
- H
  [cherry pick]Add pure fp16 amp_init for fleet API. (#30592) · 3317cf01
  由 huangxu96 提交于 1月 20, 2021
```
* add fleet amp.init()

* add unittest for fleet_amp_init
```
  3317cf01
19 1月, 2021 2 次提交
- H
  
  Ascend Framework Part3: Ascend Parser (#30391) (#30549) · 88c30b75
  由 hutuxian 提交于 1月 19, 2021
  
  88c30b75
- T
  【Cherry-Pick】add trainer number for pserver (#30524) · 3bdf1544
  由 tangwei12 提交于 1月 19, 2021
```
* add trainers for pserver

Change-Id: I99c0ab1cc427318f1f9bf8f8f5faff2b8890645d

* add trainers for pserver

Change-Id: I1a75793ec81ce126d07f4c47cae09b95d530bbc8
```
  3bdf1544
18 1月, 2021 1 次提交
- 1
  test=develop, fix fleet.metric (#30438) (#30473) · 2c3799d1
  由 123malin 提交于 1月 18, 2021
```
* test=develop, fix fleet.metrics(mse, rmse, mae)
```
  2c3799d1
15 1月, 2021 1 次提交

【Cherry-Pick】add distributed_infer (#30300) (#30427) · ae75affd

由 123malin 提交于 1月 15, 2021

* test=develop, add distributed_infer (#30300)

* test=develop, add distributed_infer

* test=develop, fix unittest cmakefile conflict

* test=develop, fix test_dist_fleet_base

ae75affd

14 1月, 2021 1 次提交
- C
  fix (#30399) · e1bad4d7
  由 Chengmo 提交于 1月 14, 2021
```
Co-authored-by: NseiriosPlus <tangwei12@baidu.com>
```
  e1bad4d7
13 1月, 2021 2 次提交
- J
  
  Recompute Offload (#30233) (#30372) · 3fbc3cf4
  由 JZ-LIANG 提交于 1月 13, 2021
  
  3fbc3cf4
- T
  split ps with distributed (#30337) · a97ca56a
  由 tangwei12 提交于 1月 13, 2021
```
Change-Id: I3c788e7576688e63181e7f01562529b85a09cc59
```
  a97ca56a
12 1月, 2021 2 次提交

【Cherry-Pick】Fix device_context & Save Tensor & Gloo (#30336) · 284bae99

由 Chengmo 提交于 1月 12, 2021

* Fix server.h include device_context (#30243)

* fix cmake
Co-authored-by: NseiriosPlus <tangwei12@baidu.com>

* 【Paddle.Fleet】Support local save sparse param (#30175)

* add save tensor support
Co-authored-by: NseiriosPlus <tangwei12@baidu.com>

* add sparse embedding & load vars for 2.0 & gloo bug fix (#30306)

* add sparse embedding & load vars for 2.0

Change-Id: I36b59ed5f015189dc9d9d2e34a9357722d369f1b

* fix hdfs gloo

Change-Id: Ia84d579053720ad804183e54c9a04b4f031c79c6

* fix gloo hdfs

Change-Id: I5ab982fd483cddc10adcdef0b8aa83aca976cb9e

* move loadvar/sparse embedding from incubute to static

Change-Id: I57081d3545ad2efab78c72420d2162c0eacaf3a0
Co-authored-by: Ntangwei12 <tangwei12@baidu.com>

284bae99

C

cherry pick tensor table (#30221) · 330aea6e
由 Chengmo 提交于 1月 12, 2021

330aea6e

11 1月, 2021 1 次提交

[cherry-pick 2.0] optimize gradient merge (#30185) · e283dc6f

由 WangXi 提交于 1月 11, 2021

* Optimization grad merge performance (#29784)

* [fleet] combine amp and gradient merge, test=develop (#30086)

* fix assign_op_xpu concat_op_xpu warining (#30120)
Co-authored-by: Nliuyuhui <liuyuhui@baidu.com>

e283dc6f

08 1月, 2021 1 次提交

[Cherry-pick] Simplify the options of spawn based on fleetrun (#30144) (#30197) · 39204d56

由 Chen Weihang 提交于 1月 07, 2021

* Simplify the options of spawn based on fleetrun (#30144)

* Simplify the options of spawn based on fleetrun

* polish details

* polish doc details

* cleanup enum test=develop (#29294)
Co-authored-by: Ngongweibao <weibao.gong@gmail.com>

39204d56

06 1月, 2021 1 次提交
- G
  Cherrypick 30071 (#30074) · 19bec2fe
  由 gongweibao 提交于 1月 06, 2021
```
* fix log test=release/2.0

* fix ut test=develop
```
  19bec2fe
05 1月, 2021 1 次提交
- G
  
  fix test=release/2.0 (#30045) · 6e2066b0
  由 gongweibao 提交于 1月 05, 2021
  
  6e2066b0
31 12月, 2020 2 次提交
- L
  fix the bug in pipeline data parallelism (#29731) (#29918) · f0e04e1f
  由 lilong12 提交于 12月 31, 2020
```
* update, test=develop
```
  f0e04e1f
- L
  [Cherry-pick] Disable gloo by default #29559 #29805 (#29601) · 640f8cf0
  由 lilong12 提交于 12月 31, 2020
```
* update, test=develop (#29559)

* Disable gloo by default (#29805)

* update, test=develop

* update, test=develop
```
  640f8cf0
25 12月, 2020 1 次提交

2 0 ps core 2 (#29894) · f781ab08

由 tangwei12 提交于 12月 25, 2020

* add ps table (#29463)

* add ps table

Change-Id: I468a04bd071d21ff52654926fcf4d5f3da19e178

* add service (#29560)

* add service, remove ut on mac

* fix heter_profiler & add heter stop method

* fix code style

* merge pscore

Change-Id: Ie7f60d1cdde6755a0c29db26863c6283e9843d57

* fix cmake

Change-Id: I6773509a7b4ca79139ecc40b7bf3eb318ceff8bb

* fix conflit

Change-Id: I35575be0c96a8520f9d756ea7f1ff0b904a165ba

* fix conflit

Change-Id: Ic926ea0b0d67803226d51241397ba3b510226bfa

f781ab08

22 12月, 2020 2 次提交
- S
  Support multi-stream communication for dynamic graph distributed (#29525) (#29821) · f7a598fa
  由 ShenLiang 提交于 12月 22, 2020
```
* fix fleet for multi-stream

* fix memcpy for ncclid

* use sync to solve move operation
```
  f7a598fa
- W
  
  fleet sync build strategy, test=develop (#29732) (#29745) · f8888a07
  由 WangXi 提交于 12月 22, 2020
  
  f8888a07
17 12月, 2020 1 次提交

[cherry-pick]fix matmulv2 bug & add rebuild group & fix bug of download (#29726) · df0430dc

由 ShenLiang 提交于 12月 17, 2020

* Fix the dowanload bug in the case of multiple machines (#29551)

* fix the dowanload bug
* add sort for ips

* Fix bug of matmul_v2 for broadcast case (#29599)

* fix bug of matmul_v2 for broadcast

* Rebuild group automatically in dynamic graph distributed (#29255)

* add tensor_indices in AssignGroupBySize

* add rebuild group in reducer

* fix error message of gather nd (#29521)

df0430dc

16 12月, 2020 1 次提交

[2.0/cherrypick] cherry-pick Sharding PR:29518 (#29593) · ab04bf01

由 JZ-LIANG 提交于 12月 16, 2020

* Sharding add hybrid-dp feature

* update sharding in distributed_strategy

* update sharding unitest

* revise code format for sharding

ab04bf01

08 12月, 2020 1 次提交
- L
  [Cherry-pick] Fix bug in gloo that gloo initialization hangs (#29449) · d8e1e50a
  由 lilong12 提交于 12月 08, 2020
```
* update, test=develop (#29331)
```
  d8e1e50a
04 12月, 2020 1 次提交
- S
  
  support dp run single card (#29358) (#29372) · b6bc4cb5
  由 ShenLiang 提交于 12月 04, 2020
  
  b6bc4cb5
03 12月, 2020 2 次提交
- S
  [Cherry-Pick]Fix reducer warning & fix doc of fleet (#29333) · afa50f45
  由 ShenLiang 提交于 12月 03, 2020
```
* fix the warning of reducer (#29323)

* fix warning of fleet (#29317)

* Fix doc of fleet api (#29282)
```
  afa50f45
- S
  [cherry-pick]Change the api of DataParallel and Fleet (#29288) · ec57656e
  由 ShenLiang 提交于 12月 03, 2020
```
* Change the api of DataParallel and Fleet (#29224)
```
  ec57656e
01 12月, 2020 1 次提交
- 1
  test=develop, fix doc (#29200) · cc9c6196
  由 123malin 提交于 12月 01, 2020
```
* fix fleet api doc
```
  cc9c6196
30 11月, 2020 2 次提交
- W
  
  optimizer amp, all use fp16 communication, overlap last comm and compute (#28957) · 0c2a51d2
  由 WangXi 提交于 11月 30, 2020
  
  0c2a51d2
- 1
  test=develop, rm pathlib (#28658) · 92817f80
  由 123malin 提交于 11月 30, 2020
```
* test=develop, rm pathlib
```
  92817f80
27 11月, 2020 3 次提交

Support dynamic graph distributed (#28997) · e2d01eb6

由 ShenLiang 提交于 11月 27, 2020

* add reducer

* refine envent for memorycopy

* add concat&split for allreduce

* apply concat & split for fuse tensor

* fix nccl dep

* fix the untest, compile problem and ddp initialize problem

* fix untest for mac & add some comments & solve the repeated param in sublayers

* fix untest for windows & fix document

e2d01eb6

C

fix some docs test=develop;test=document_fix (#29159) · d576d6dd
由 Chen Long 提交于 11月 27, 2020

d576d6dd
L

Add a flag to control whether to initialize gloo (#29150) · a1add716
由 lilong12 提交于 11月 27, 2020

a1add716

26 11月, 2020 5 次提交

S
fix InMemoryDataset doc (#28688) · cddc7096
由 ShenLiang 提交于 11月 26, 2020
```
* add Inmemorydataset
```
cddc7096

[sharding] doc, api, bug fixed (#28983) · 0dadacc4

由 JZ-LIANG 提交于 11月 26, 2020

* add lars to fleet meta optimizer

* add lamb to proto

* add lamb to fleet meta optimizer

* fixed syntax bug

* fixed syntax bug

* fixed syntax error in lamb, add config setter of lamb in distributed_strategy

* trigger unitest to rerun

* add new unitest func for lamb

* revise unitest for lars and lamb

* revise dgc meta unitest

* revise lars document in distribute_strategy

* revise lars lamb document in distributed_strategy.py

* revise lars lamb document in distributed_strategy.py

* add weight decay exclude logic to lars

* restore optimzier.py

* restore optimizer.py as develop except lars

* add epsilon and exclude fn to distributed_sttrategy

* add lars epsilon

* revise unitest for fleet lars and lamb

* revise lars lamb unitest for CI coverage

* revise lars argument api

* revise lars argument api

* revise lars argument api

* revise api doc of lars

* fix op role

* add sharding save and add_sync_comm_for_test function

* add comm_analyse to utlis

* revise sharding_utils

* add sharding saving unittest

* revise sharding utils for unittest

* revise sharding en doc

* update sharding utils api

* add doc for sharding

* fixed bug in sharding var size count

* update varsize count in sharding

* fix sharding num_nccl_comm

* Revert "fix sharding num_nccl_comm"

This reverts commit d51587c15e9323acf226ddd36154275f0d1daf76.

0dadacc4

L
fix the bug in gloo (#29112) · 2a864c70
由 lilong12 提交于 11月 26, 2020
```
* update, test=develop
```
2a864c70
W

Fix multi nccl comm & wait server ready (#28663) · e931c7ba
由 WangXi 提交于 11月 26, 2020

e931c7ba
G

Clean up the redundant files and unify the launch interface. (#28928) · 1358397e
由 gongweibao 提交于 11月 26, 2020

1358397e

机器未来 / Paddle 与 Fork 源项目一致

机器未来 / Paddle
与 Fork 源项目一致