提交 · 0589ed21b66872b9f333b77d860eab5202df6d26 · BaiXuePrincess / Paddle

01 4月, 2021 2 次提交
- T
  LOG CLEAN (#31819) · 0589ed21
  由 tangwei12 提交于 4月 01, 2021
```
* upgrade vlog

* train from dataset fetch optimize
```
  0589ed21
- K
  new group (#31682) · 07741593
  由 kuizhiqing 提交于 4月 01, 2021
```
* new group

* ci compatible fix

* assert nccl
```
  07741593
31 3月, 2021 1 次提交
- L
  Adjust pipeline optimizer for 3d parallelism (#31939) · 695dd371
  由 lilong12 提交于 3月 31, 2021
```
* update, test=develop
```
  695dd371
26 3月, 2021 1 次提交
- L
  [3D-parallel] Reformat pipeline parallel (#31786) · c3974d0e
  由 lilong12 提交于 3月 26, 2021
```
* update, test=develop
```
  c3974d0e
25 3月, 2021 1 次提交
- C
  【Paddle.Fleet】fix dataset zip py3 bug (#31441) · f58cb018
  由 Chengmo 提交于 3月 25, 2021
```
* fix zip py3 bug
```
  f58cb018
22 3月, 2021 1 次提交
- L
  [3D-parallel] add 1f1b scheduler for pipeline (#31566) · a501a7b0
  由 lilong12 提交于 3月 22, 2021
```
* add 1f1b scheduler for pp, test=develop
```
  a501a7b0
18 3月, 2021 1 次提交
- C
  【Paddle.Fleet】Fix one ps gradient clip (#31664) · 09482dde
  由 Chengmo 提交于 3月 18, 2021
```
* fix one ps gradient clip
```
  09482dde
15 3月, 2021 1 次提交
- S
  
  fix amp bug of fleet (#31532) · c3634c6b
  由 ShenLiang 提交于 3月 15, 2021
  
  c3634c6b
10 3月, 2021 1 次提交

remove the send/recv of tensor size (#31460) · 0205e9f8

由 lilong12 提交于 3月 10, 2021

* remove the send/recv of tensor size, but users have to specify the shape of the received var explicitly.

0205e9f8

05 3月, 2021 1 次提交
- L
  [Kunlun]Multi xpu dygraph performance optimization , add distributed.spawn... · 9ebf05b0
  由 liuyuhui 提交于 3月 05, 2021
```
[Kunlun]Multi xpu dygraph performance optimization , add distributed.spawn support for multi xpu and some bug-fixes (#31130)
```
  9ebf05b0
02 3月, 2021 1 次提交

topo and memory performance for heterps (#30440) · d1075df2

由 danleifeng 提交于 3月 02, 2021

* topo and memory performance for heterps; test=develop
* add trainwithprofiler in heter trainier; test=develop

d1075df2

24 2月, 2021 2 次提交

L
align the default value of some configuration for fleet to that of single cards (#30740) · dc8dfba3
由 lilong12 提交于 2月 24, 2021
```
* update, test=develop
```
dc8dfba3

fix entry (#31079) · ebbdf525

由 tangwei12 提交于 2月 24, 2021

* fix entry

* fix distributed lookup table fuse case

* fix entry bug at first time

* move entry from paddle.fluid -> paddle.distributed

* fix ut with paddle.enable_static()
Co-authored-by: Nmalin10 <malin10@baidu.com>

ebbdf525

20 2月, 2021 1 次提交
- 1
  test=develop, save/load, shrink (#30625) · 16b4260b
  由 123malin 提交于 2月 20, 2021
```
* test=develop, save/load, shrink
Co-authored-by: NseiriosPlus <tangwei12@baidu.com>
```
  16b4260b
05 2月, 2021 1 次提交
- L
  
  [Kunlun] add gen_bkcl_id_op, support multi XPU cards training using multiprocess (#30858) · 4a8b8b45
  由 liuyuhui 提交于 2月 05, 2021
  
  4a8b8b45
03 2月, 2021 1 次提交
- W
  
  【kunlun】dygraph supports multi xpu card training (#30671) · b1026f64
  由 WangXi 提交于 2月 03, 2021
  
  b1026f64
01 2月, 2021 1 次提交
- W
  
  Fleet distributed strategy support pure fp16 (#30754) · 31ed9c9e
  由 WangXi 提交于 2月 01, 2021
  
  31ed9c9e
21 1月, 2021 1 次提交
- Z
  Fix the bug in fleet amp_init. (#30606) · 4a9de931
  由 Zhen Wang 提交于 1月 21, 2021
```
* Fix the bug in fleet amp_init.

* Fix the amp_init unit test.
```
  4a9de931
20 1月, 2021 3 次提交
- H
  Add fleet amp_init() (#30572) · 13862008
  由 huangxu96 提交于 1月 20, 2021
```
* add fleet amp.init()

* add unittest for fleet_amp_init
```
  13862008
- L
  fix the bug of all_reduce pipeline gradient multiple times (#30437) · 8126a41d
  由 lilong12 提交于 1月 20, 2021
```
* update, test=develop
```
  8126a41d
- T
  add trainers for pserver (#30523) · c9e78a22
  由 tangwei12 提交于 1月 20, 2021
```
* add trainers for pserver

Change-Id: I1a75793ec81ce126d07f4c47cae09b95d530bbc8
```
  c9e78a22
18 1月, 2021 1 次提交
- H
  
  Ascend Framework Part3: Ascend Parser (#30391) · 9fec1618
  由 hutuxian 提交于 1月 18, 2021
  
  9fec1618
15 1月, 2021 1 次提交
- 1
  test=develop, fix fleet.metric (#30438) · 05f06d9a
  由 123malin 提交于 1月 15, 2021
```
* test=develop, fix fleet.metrics(mse, rmse, mae)
```
  05f06d9a
14 1月, 2021 2 次提交
- C
  fix ps init(#30397) · 859431aa
  由 Chengmo 提交于 1月 14, 2021
```
Co-authored-by: NseiriosPlus <tangwei12@baidu.com>
```
  859431aa
- 1
  test=develop, add distributed_infer (#30300) · 2a98e932
  由 123malin 提交于 1月 14, 2021
```
* test=develop, add distributed_infer
```
  2a98e932
12 1月, 2021 3 次提交
- J
  
  Recompute Offload (#30233) · 75936d83
  由 JZ-LIANG 提交于 1月 12, 2021
  
  75936d83
- T
  Fix/distributed proto (#29981) · 25f80fd3
  由 tangwei12 提交于 1月 12, 2021
```
* rename sendrecv.proto to namespace paddle.distributed

* split ps with distributed
```
  25f80fd3
- C
  【Paddle.Fleet】Support local save sparse param (#30175) · d479ae17
  由 Chengmo 提交于 1月 12, 2021
```
* add save tensor support
Co-authored-by: NseiriosPlus <tangwei12@baidu.com>
```
  d479ae17
08 1月, 2021 2 次提交
- C
  
  remove distributed prepare context (#30219) · 3016ba85
  由 Chen Weihang 提交于 1月 08, 2021
  
  3016ba85
- C
  【Paddle.Fleet】Fix tensor table (#30075) · 528e03fc
  由 Chengmo 提交于 1月 08, 2021
```
* add tensor table
```
  528e03fc
07 1月, 2021 1 次提交
- C
  Simplify the options of spawn based on fleetrun (#30144) · 8020e34e
  由 Chen Weihang 提交于 1月 06, 2021
```
* Simplify the options of spawn based on fleetrun

* polish details

* polish doc details
```
  8020e34e
06 1月, 2021 1 次提交
- G
  
  fix logs info test=develop (#30071) · 4d2a4bb2
  由 gongweibao 提交于 1月 06, 2021
  
  4d2a4bb2
05 1月, 2021 3 次提交
- W
  
  [fleet] combine amp and gradient merge, test=develop (#30086) · ab049978
  由 WangXi 提交于 1月 05, 2021
  
  ab049978
- G
  
  fix selected_gpus test=develop (#30044) · eea7090c
  由 gongweibao 提交于 1月 05, 2021
  
  eea7090c
- C
  Set FLAGS_selected_gpus for spawn (#29962) · 46c46954
  由 Chen Weihang 提交于 1月 04, 2021
```
* set flags_selectedd_gpus for spawn

* add cond for unittest

* Delete test_no_single_process_using_multi_gpus_in_spawn.py

* Update spawn.py

* Update nccl_context.cc
```
  46c46954
31 12月, 2020 2 次提交
- L
  Disable gloo by default (#29805) · b0bd93de
  由 lilong12 提交于 12月 31, 2020
```
* update, test=develop
```
  b0bd93de
- L
  add the paddle.distributed.split api (#29970) · 2bc5121d
  由 lilong12 提交于 12月 31, 2020
```
* add distributed.split, test=develop
```
  2bc5121d
25 12月, 2020 1 次提交
- L
  fix the bug in pipeline data parallelism (#29731) · 01950ceb
  由 lilong12 提交于 12月 25, 2020
```
* update, test=develop
```
  01950ceb
24 12月, 2020 1 次提交

[Feature] one ps (3/4) (#29604) · 032414ca

由 tangwei12 提交于 12月 24, 2020

* oneps (3/4)
Co-authored-by: NMrChengmo <cmchengmo@163.com>
Co-authored-by: Nmalin10 <malin10@baidu.com>
Co-authored-by: Nchengmo <chengmo@baidu.com>

032414ca

22 12月, 2020 1 次提交
- S
  Support multi-stream communication for dynamic graph distributed (#29525) · 01e2874a
  由 ShenLiang 提交于 12月 22, 2020
```
* fix fleet for multi-stream

* fix memcpy for ncclid

* use sync to solve move operation
```
  01e2874a

BaiXuePrincess / Paddle 与 Fork 源项目一致

BaiXuePrincess / Paddle
与 Fork 源项目一致