提交 · 7f2aa2db3c69cb9ebb8bae9e19280e75f964e1d0 · OPTHREE / Paddle

30 8月, 2020 1 次提交
- C
  【paddle.fleet】Support Heter Parameter Server (#25998) · 7f2aa2db
  由 Chengmo 提交于 8月 30, 2020
```
* Support Heter Parameter Server
```
  7f2aa2db
13 8月, 2020 1 次提交
- D
  【paddle.fleet】paddle.fleet -> paddle.distributed.fleet. (#26186) · 50a5bcfc
  由 Dong Daxiang 提交于 8月 13, 2020
```
* move paddle.fleet to paddle.distributed.fleet
```
  50a5bcfc
07 8月, 2020 1 次提交

【paddle.fleet】fleet_util move to paddle.fleet (#25805) · 2191a083

由 123malin 提交于 8月 07, 2020

* test=develop,test=document_fix, remove the out args

* fleet_util move to paddle.fleet
Co-authored-by: NWuHaobo <wuhaobo1994@gmail.com>
Co-authored-by: Ntangwei12 <tangwei12@baidu.com>

2191a083

06 8月, 2020 1 次提交

add heter ps mode (#25682) · 0cb60c70

由 Thunderbrook 提交于 8月 06, 2020

* add heter ps mode

* code style
test=develop

* add with_pslib
test=develop

* unitest
test=develop

* code style
test=develop

* code style
test=develop

* code style
test=develop

* code style
test=develop

* code style
test=develop

* code style
test=develop

* code style
test=develop

* code style
test=develop

* test monitor
test=develop

* prepare trainer
test=develop

* code style
test=develop

0cb60c70

30 7月, 2020 1 次提交

Integrated Trainer of Parameter Server (API add... · caa90a65

由 tangwei12 提交于 7月 30, 2020

Integrated Trainer of Parameter Server (API add `fluid.contrib.layers.sparse_embedding` only) (#22957)

* Integrated Trainer of Parameter Server

caa90a65

14 4月, 2020 1 次提交
- C
  Fix judge pslib transpiler (#23720) · ddd60444
  由 Chengmo 提交于 4月 14, 2020
```
* fix judge pslib & ranspiler
```
  ddd60444
18 3月, 2020 1 次提交
- T
  Revert "Integrated API of Parameter Server (#22710)" test=develop (#23071) · c4a6a0e2
  由 tangwei12 提交于 3月 18, 2020
```
This reverts commit 66fce9e8.
```
  c4a6a0e2
17 3月, 2020 1 次提交
- T
  Integrated API of Parameter Server (#22710) · 66fce9e8
  由 tangwei12 提交于 3月 17, 2020
```
Fleet Parameter Server API Integrated
```
  66fce9e8
02 2月, 2020 1 次提交
- X
  add GeneralRoleMaker (#22295) · 371f377b
  由 xujiaqi01 提交于 2月 02, 2020
```
* add GeneralRoleMaker which is for general usage
* test=develop
```
  371f377b
26 11月, 2019 1 次提交
- Z
  Fix some typos in AMP. (#21354) · be2e3e67
  由 Zhen Wang 提交于 11月 26, 2019
```
* fix some typos in AMP. test=develop

* delete useless codes. test=develop
```
  be2e3e67
10 9月, 2019 1 次提交
- G
  Fix float16 optimizer. (#19682) · 6c2bc29c
  由 gongweibao 提交于 9月 10, 2019
```
Fix float16 optimizer
```
  6c2bc29c
06 9月, 2019 1 次提交
- 1
  Optimize fleet API: add input check for some interfaces (#18971) · a25a716e
  由 123malin 提交于 9月 06, 2019
```
* fleet api add input check, test=develop
```
  a25a716e
16 8月, 2019 1 次提交
- G
  Remove node_num function. (#19167) · 86f05911
  由 gongweibao 提交于 8月 16, 2019
```
node_num is not needed for users, so remove them and fix the bugs about it!
```
  86f05911
12 8月, 2019 1 次提交
- G
  Polish fleet API to support cuda collective mode and nccl2 mode. (#18966) · 29d87812
  由 gongweibao 提交于 8月 12, 2019
```
Polish fleet API to support cuda collective mode and nccl2 mode
```
  29d87812
02 8月, 2019 1 次提交

support filelist size < trainer num && fix pull dense (#18956) · 02c370c3

由 jiaqi 提交于 8月 02, 2019

* support filelist size < trainer num
* pull dense when stop, to make sure local dense params are same as pserver, so save paddle model will save dense model same as pserver
*  enable QueueDataset train same filelist for serveral times

02c370c3

22 7月, 2019 1 次提交
- T
  do some odd jobs (#18641) · d8458483
  由 tangwei12 提交于 7月 22, 2019
```
do some odd jobs, test=develop
```
  d8458483
27 6月, 2019 1 次提交

supports collective communicated training (#18175) · b7128bac

由 HaoRen 提交于 6月 27, 2019

* fix prepare context redundant code problem, optimize executor by caching create_varaiables
test=develop

* supports collective training in executor

* make fetch_list runable with variables, add more unittest for use_program_cache
test=develop

* fix comment
test=develop

* use unique name for nccl_id

* supports output to stream in program_to_code

* insert sync_comm_stream before regularization; add skip_op_callstack capability in program_to_code

* set op role in collective training

* add collective op role

* remove orig file

* add build optimizer by strategy

* add collective strategy

* refine collective strategy

* add multi-process role maker

* refine strategy building factory so that we can easily plugin more strategy

* scale loss grad in collective sgd transpiler

* add support for distributed fc

* code format

* revert some features for dist fc

* add support for distributed fc training

* fix prepare context redundant code problem, optimize executor by caching create_varaiables
test=develop

* supports collective training in executor

* make fetch_list runable with variables, add more unittest for use_program_cache
test=develop

* use unique name for nccl_id

* supports output to stream in program_to_code

* insert sync_comm_stream before regularization; add skip_op_callstack capability in program_to_code

* set op role in collective training

* add collective op role

* fix comment
test=develop

* remove orig file

* add build optimizer by strategy

* add collective strategy

* refine collective strategy

* add multi-process role maker

* refine strategy building factory so that we can easily plugin more strategy

* scale loss grad in collective sgd transpiler

* add support for distributed fc

* code format

* revert some features for dist fc

* add support for distributed fc training

* test=develop
add collective op unittest standard

* test=develop
remove the test_collective directory

* test=develop
remove the test_collective directory

* remove slicegather test

* code format for reducescatter

* update attr of shard_index_op

* Modify macro nccl_helper

* remove test without distribute

* macro collective_helper

* marcro update

* test=develop
update support python3.5

* test=develop change gpu memory use to 0.1 when test

* test=develop
update ut equal func

* test=develop
set flags to 1.5

* test=develop fix pickle dumple  py35

* test=develop
fix divide in slice and add sync_comm_stream
update atol and rtol to 1e-05
rm shard_index op and test
modify read input from file to read from memory
remove origin_program in framework and add i/o in c_sync_calc_stream

* test=develop update unittest sync operator I/O

b7128bac

17 6月, 2019 2 次提交

Q
assign role_maker before use (#18137) · 23f8a4b1
由 Qiao Longfei 提交于 6月 17, 2019
```
fix role_maker bug
test=develop
```
23f8a4b1

add paddle cloud role maker for customized usage, note this is only for... · 58f3e1ba

由 guru4elephant 提交于 6月 17, 2019

add paddle cloud role maker for customized usage, note this is only for industrial users that have cloud environment pre-configuration (#18121)

add paddle cloud role maker for specific cloud usage. This pr will simplifies user's configuration in distributed training.

58f3e1ba

12 6月, 2019 1 次提交
- T
  fix save/load in fleet (#17675) · 101f74cb
  由 tangwei12 提交于 6月 12, 2019
```
* fix save/load in Fleet
* add UT framework of Fleet
```
  101f74cb
23 5月, 2019 1 次提交
- Q
  Async exe support communicator (#17386) · 58f7695a
  由 Qiao Longfei 提交于 5月 23, 2019
```
Async exe support communicator
```
  58f7695a
09 5月, 2019 1 次提交

Reformat fleet API (#17135) · 565d3095

由 tangwei12 提交于 5月 09, 2019

* fix some logic in distributed transpiler, test=develop
* reformat fleet API, test=develop

565d3095

25 4月, 2019 1 次提交
- T
  Fleet unify distributed training (#16791) · 1a4a51db
  由 tangwei12 提交于 4月 25, 2019
```
* implement distributed transpiler with fleet
```
  1a4a51db

OPTHREE / Paddle 与 Fork 源项目一致

OPTHREE / Paddle
与 Fork 源项目一致