提交 · 6c2bc29cc06b74153d0c5e3af43e7a011a27df71 · BaiXuePrincess / Paddle

10 9月, 2019 1 次提交
- G
  Fix float16 optimizer. (#19682) · 6c2bc29c
  由 gongweibao 提交于 9月 10, 2019
```
Fix float16 optimizer
```
  6c2bc29c
06 9月, 2019 1 次提交
- 1
  Optimize fleet API: add input check for some interfaces (#18971) · a25a716e
  由 123malin 提交于 9月 06, 2019
```
* fleet api add input check, test=develop
```
  a25a716e
16 8月, 2019 1 次提交
- G
  Remove node_num function. (#19167) · 86f05911
  由 gongweibao 提交于 8月 16, 2019
```
node_num is not needed for users, so remove them and fix the bugs about it!
```
  86f05911
12 8月, 2019 1 次提交
- G
  Polish fleet API to support cuda collective mode and nccl2 mode. (#18966) · 29d87812
  由 gongweibao 提交于 8月 12, 2019
```
Polish fleet API to support cuda collective mode and nccl2 mode
```
  29d87812
02 8月, 2019 1 次提交

support filelist size < trainer num && fix pull dense (#18956) · 02c370c3

由 jiaqi 提交于 8月 02, 2019

* support filelist size < trainer num
* pull dense when stop, to make sure local dense params are same as pserver, so save paddle model will save dense model same as pserver
*  enable QueueDataset train same filelist for serveral times

02c370c3

25 7月, 2019 1 次提交
- G
  refine launch_ps and role_maker (#18795) · 30562e37
  由 guru4elephant 提交于 7月 25, 2019
```
refine launch_ps and role_maker
```
  30562e37
22 7月, 2019 1 次提交
- T
  do some odd jobs (#18641) · d8458483
  由 tangwei12 提交于 7月 22, 2019
```
do some odd jobs, test=develop
```
  d8458483
10 7月, 2019 1 次提交
- G
  upgrade collective fleet api (#18533) · 9c17a899
  由 guru4elephant 提交于 7月 10, 2019
```
* upgrade collective fleet api
```
  9c17a899
08 7月, 2019 1 次提交
- G
  add random port (#18504) · 1f1cc222
  由 guru4elephant 提交于 7月 08, 2019
```
* add random port
```
  1f1cc222
02 7月, 2019 1 次提交
- G
  make fleet support mpi job submit directly (#18441) · 357311fd
  由 guru4elephant 提交于 7月 02, 2019
```
make fleet support mpi job submit directly.
```
  357311fd
27 6月, 2019 1 次提交

supports collective communicated training (#18175) · b7128bac

由 HaoRen 提交于 6月 27, 2019

* fix prepare context redundant code problem, optimize executor by caching create_varaiables
test=develop

* supports collective training in executor

* make fetch_list runable with variables, add more unittest for use_program_cache
test=develop

* fix comment
test=develop

* use unique name for nccl_id

* supports output to stream in program_to_code

* insert sync_comm_stream before regularization; add skip_op_callstack capability in program_to_code

* set op role in collective training

* add collective op role

* remove orig file

* add build optimizer by strategy

* add collective strategy

* refine collective strategy

* add multi-process role maker

* refine strategy building factory so that we can easily plugin more strategy

* scale loss grad in collective sgd transpiler

* add support for distributed fc

* code format

* revert some features for dist fc

* add support for distributed fc training

* fix prepare context redundant code problem, optimize executor by caching create_varaiables
test=develop

* supports collective training in executor

* make fetch_list runable with variables, add more unittest for use_program_cache
test=develop

* use unique name for nccl_id

* supports output to stream in program_to_code

* insert sync_comm_stream before regularization; add skip_op_callstack capability in program_to_code

* set op role in collective training

* add collective op role

* fix comment
test=develop

* remove orig file

* add build optimizer by strategy

* add collective strategy

* refine collective strategy

* add multi-process role maker

* refine strategy building factory so that we can easily plugin more strategy

* scale loss grad in collective sgd transpiler

* add support for distributed fc

* code format

* revert some features for dist fc

* add support for distributed fc training

* test=develop
add collective op unittest standard

* test=develop
remove the test_collective directory

* test=develop
remove the test_collective directory

* remove slicegather test

* code format for reducescatter

* update attr of shard_index_op

* Modify macro nccl_helper

* remove test without distribute

* macro collective_helper

* marcro update

* test=develop
update support python3.5

* test=develop change gpu memory use to 0.1 when test

* test=develop
update ut equal func

* test=develop
set flags to 1.5

* test=develop fix pickle dumple  py35

* test=develop
fix divide in slice and add sync_comm_stream
update atol and rtol to 1e-05
rm shard_index op and test
modify read input from file to read from memory
remove origin_program in framework and add i/o in c_sync_calc_stream

* test=develop update unittest sync operator I/O

b7128bac

23 6月, 2019 1 次提交
- G
  fix paddle cloud role maker bug (#18269) · ff399fd7
  由 guru4elephant 提交于 6月 23, 2019
```
* fix paddle cloud role maker bug
```
  ff399fd7
17 6月, 2019 2 次提交

Q
assign role_maker before use (#18137) · 23f8a4b1
由 Qiao Longfei 提交于 6月 17, 2019
```
fix role_maker bug
test=develop
```
23f8a4b1

add paddle cloud role maker for customized usage, note this is only for... · 58f3e1ba

由 guru4elephant 提交于 6月 17, 2019

add paddle cloud role maker for customized usage, note this is only for industrial users that have cloud environment pre-configuration (#18121)

add paddle cloud role maker for specific cloud usage. This pr will simplifies user's configuration in distributed training.

58f3e1ba

12 6月, 2019 1 次提交
- T
  fix save/load in fleet (#17675) · 101f74cb
  由 tangwei12 提交于 6月 12, 2019
```
* fix save/load in Fleet
* add UT framework of Fleet
```
  101f74cb
11 6月, 2019 1 次提交

add UserDefinedCollectiveRoleMaker for collective mode (#17898) · b5c35ae3

由 lilong12 提交于 6月 11, 2019

* add 'UserDefinedRoleMakerNCCL' for collective mode.

* code style

* add the name UserDefinedRoleMakerNCCL to __all__

* rename to UserDefinedRoleMakerCollective

* rename to UserDefinedCollectiveRoleMaker

b5c35ae3

23 5月, 2019 1 次提交
- Q
  Async exe support communicator (#17386) · 58f7695a
  由 Qiao Longfei 提交于 5月 23, 2019
```
Async exe support communicator
```
  58f7695a
15 5月, 2019 1 次提交

support config file, cvm, load, save, shrink (#17319) · 34369944

由 jiaqi 提交于 5月 15, 2019

* support config file, cvm, load, save, shrink
test=develop

* fix error of worker_num & add table.compress_in_save
test=develop

* fix code style
test=develop

* fix save model bug
test=develop

34369944

09 5月, 2019 1 次提交

Reformat fleet API (#17135) · 565d3095

由 tangwei12 提交于 5月 09, 2019

* fix some logic in distributed transpiler, test=develop
* reformat fleet API, test=develop

565d3095

25 4月, 2019 1 次提交
- T
  Fleet unify distributed training (#16791) · 1a4a51db
  由 tangwei12 提交于 4月 25, 2019
```
* implement distributed transpiler with fleet
```
  1a4a51db
11 4月, 2019 1 次提交
- D
  
  fix code style for incubator · ceac9df8
  由 dongdaxiang 提交于 4月 10, 2019
  
  ceac9df8
09 4月, 2019 1 次提交
- X
  move split filelist from trainer.py to fleet & fix error · d5ee580c
  由 xjqbest 提交于 4月 09, 2019
```
test=develop
```
  d5ee580c
30 3月, 2019 1 次提交
- X
  fix client to client communication bug · a99c8d0c
  由 xjqbest 提交于 3月 30, 2019
```
test=develop
```
  a99c8d0c
29 3月, 2019 10 次提交
- X
  fix code style & runtime error · a38b98cb
  由 xjqbest 提交于 3月 26, 2019
```
test=develop
```
  a38b98cb
- D
  
  make role maker and distributed optimizer private · 17790188
  由 dongdaxiang 提交于 3月 26, 2019
  
  17790188
- X
  
  fix runtime error · f5c6a14b
  由 xujiaqi01 提交于 3月 20, 2019
  
  f5c6a14b
- X
  
  support multi dataset && add init model && fix bug · a5b1a0e1
  由 xujiaqi01 提交于 3月 20, 2019
  
  a5b1a0e1
- D
  
  add document for role_maker and fleet parameter, data_generator · 3c65cc1b
  由 dongdaxiang 提交于 3月 19, 2019
  
  3c65cc1b
- D
  
  add trainfileswithprofiler for downpour worker · 3e38d1db
  由 dongdaxiang 提交于 3月 15, 2019
  
  3e38d1db
- D
  
  add trainfileswithprofiler for downpour worker · 6af697ad
  由 dongdaxiang 提交于 3月 15, 2019
  
  6af697ad
- D
  
  add comment for MPI Symetric role maker · ea5851fa
  由 dongdaxiang 提交于 3月 14, 2019
  
  ea5851fa
- D
  
  add incubate for unified API · f6128777
  由 dongdaxiang 提交于 3月 13, 2019
  
  f6128777
- D
  
  add incubate for unified API · 3641a78b
  由 dongdaxiang 提交于 3月 13, 2019
  
  3641a78b

BaiXuePrincess / Paddle 与 Fork 源项目一致

BaiXuePrincess / Paddle
与 Fork 源项目一致