提交 · 30562e371b3c4a1934e32879a778d1021aa88138 · BaiXuePrincess / Paddle

25 7月, 2019 2 次提交
- G
  refine launch_ps and role_maker (#18795) · 30562e37
  由 guru4elephant 提交于 7月 25, 2019
```
refine launch_ps and role_maker
```
  30562e37
- F
  Fix shrink-dense and add scale-datanorm (#18746) · c167a4b4
  由 fuyinno4 提交于 7月 25, 2019
```
Fix FleetWrapper:
1. fix shrink dense: just scale show
2. add datanorm scale: divide datanorm's gradient by batch_size
```
  c167a4b4
24 7月, 2019 1 次提交

add slot to sparse table (#18686) · d8396281

由 Thunderbrook 提交于 7月 24, 2019

The change includes 2 things:

1. save delta model and shrink table are control by the same parameter before, now add delete_after_unseen_days to control shrink table.
2. value in sparse table has no slot before, now add slot in sparse table, and add DownpureCtrAccessor to support the new meta.
test=develop

d8396281

23 7月, 2019 1 次提交

support patch data, add load_one_table, fix bug (#18509) · d18aabb4

由 jiaqi 提交于 7月 23, 2019

（1）support patch data （merge slots of instances of same line id, modify dense layer which
changes its size）
（2）add fleet load_one_table interface, support load from paddle model and load from pslib model
（3）fix push sparse bug which cause push sparse cost more time（about 10% in my testcase）
（4）when some slots are not in one of your network (join/update, etc.)，data feed、collect label info、push/pull sparse will skip these slots， instead of throw error.
（5）add more debug info in TrainFilesWithProfiler

d18aabb4

22 7月, 2019 1 次提交
- T
  do some odd jobs (#18641) · d8458483
  由 tangwei12 提交于 7月 22, 2019
```
do some odd jobs, test=develop
```
  d8458483
10 7月, 2019 1 次提交
- G
  upgrade collective fleet api (#18533) · 9c17a899
  由 guru4elephant 提交于 7月 10, 2019
```
* upgrade collective fleet api
```
  9c17a899
08 7月, 2019 1 次提交
- G
  add random port (#18504) · 1f1cc222
  由 guru4elephant 提交于 7月 08, 2019
```
* add random port
```
  1f1cc222
02 7月, 2019 1 次提交
- G
  make fleet support mpi job submit directly (#18441) · 357311fd
  由 guru4elephant 提交于 7月 02, 2019
```
make fleet support mpi job submit directly.
```
  357311fd
27 6月, 2019 2 次提交

T
fix communicator with pyreader (#18350) · 999d9a59
由 tangwei12 提交于 6月 27, 2019
```
* add is_runnning in communicator, test=develop
```
999d9a59

supports collective communicated training (#18175) · b7128bac

由 HaoRen 提交于 6月 27, 2019

* fix prepare context redundant code problem, optimize executor by caching create_varaiables
test=develop

* supports collective training in executor

* make fetch_list runable with variables, add more unittest for use_program_cache
test=develop

* fix comment
test=develop

* use unique name for nccl_id

* supports output to stream in program_to_code

* insert sync_comm_stream before regularization; add skip_op_callstack capability in program_to_code

* set op role in collective training

* add collective op role

* remove orig file

* add build optimizer by strategy

* add collective strategy

* refine collective strategy

* add multi-process role maker

* refine strategy building factory so that we can easily plugin more strategy

* scale loss grad in collective sgd transpiler

* add support for distributed fc

* code format

* revert some features for dist fc

* add support for distributed fc training

* fix prepare context redundant code problem, optimize executor by caching create_varaiables
test=develop

* supports collective training in executor

* make fetch_list runable with variables, add more unittest for use_program_cache
test=develop

* use unique name for nccl_id

* supports output to stream in program_to_code

* insert sync_comm_stream before regularization; add skip_op_callstack capability in program_to_code

* set op role in collective training

* add collective op role

* fix comment
test=develop

* remove orig file

* add build optimizer by strategy

* add collective strategy

* refine collective strategy

* add multi-process role maker

* refine strategy building factory so that we can easily plugin more strategy

* scale loss grad in collective sgd transpiler

* add support for distributed fc

* code format

* revert some features for dist fc

* add support for distributed fc training

* test=develop
add collective op unittest standard

* test=develop
remove the test_collective directory

* test=develop
remove the test_collective directory

* remove slicegather test

* code format for reducescatter

* update attr of shard_index_op

* Modify macro nccl_helper

* remove test without distribute

* macro collective_helper

* marcro update

* test=develop
update support python3.5

* test=develop change gpu memory use to 0.1 when test

* test=develop
update ut equal func

* test=develop
set flags to 1.5

* test=develop fix pickle dumple  py35

* test=develop
fix divide in slice and add sync_comm_stream
update atol and rtol to 1e-05
rm shard_index op and test
modify read input from file to read from memory
remove origin_program in framework and add i/o in c_sync_calc_stream

* test=develop update unittest sync operator I/O

b7128bac

23 6月, 2019 1 次提交
- G
  fix paddle cloud role maker bug (#18269) · ff399fd7
  由 guru4elephant 提交于 6月 23, 2019
```
* fix paddle cloud role maker bug
```
  ff399fd7
17 6月, 2019 2 次提交

Q
assign role_maker before use (#18137) · 23f8a4b1
由 Qiao Longfei 提交于 6月 17, 2019
```
fix role_maker bug
test=develop
```
23f8a4b1

add paddle cloud role maker for customized usage, note this is only for... · 58f3e1ba

由 guru4elephant 提交于 6月 17, 2019

add paddle cloud role maker for customized usage, note this is only for industrial users that have cloud environment pre-configuration (#18121)

add paddle cloud role maker for specific cloud usage. This pr will simplifies user's configuration in distributed training.

58f3e1ba

13 6月, 2019 1 次提交
- T
  
  fix bug in fleet, test=develop (#18058) · 4c735f24
  由 tangwei12 提交于 6月 13, 2019
  
  4c735f24
12 6月, 2019 2 次提交

T
fix save/load in fleet (#17675) · 101f74cb
由 tangwei12 提交于 6月 12, 2019
```
* fix save/load in Fleet
* add UT framework of Fleet
```
101f74cb

fix logging basicConfig cannot be setting after import paddle (#17786) · 96ee528e

由 Kaipeng Deng 提交于 6月 12, 2019

* fix logging unable. test=develop

* unset sys.stdout for stream handler. test=develop

* fix newly add basicConfig. test=develop

* fix import error. test=develop

96ee528e

11 6月, 2019 1 次提交

add UserDefinedCollectiveRoleMaker for collective mode (#17898) · b5c35ae3

由 lilong12 提交于 6月 11, 2019

* add 'UserDefinedRoleMakerNCCL' for collective mode.

* code style

* add the name UserDefinedRoleMakerNCCL to __all__

* rename to UserDefinedRoleMakerCollective

* rename to UserDefinedCollectiveRoleMaker

b5c35ae3

23 5月, 2019 1 次提交
- Q
  Async exe support communicator (#17386) · 58f7695a
  由 Qiao Longfei 提交于 5月 23, 2019
```
Async exe support communicator
```
  58f7695a
17 5月, 2019 1 次提交
- J
  support sparse table get shard_num from TableParameter (#17443) · 05df39ac
  由 jiaqi 提交于 5月 17, 2019
```
test=develop
```
  05df39ac
15 5月, 2019 1 次提交

support config file, cvm, load, save, shrink (#17319) · 34369944

由 jiaqi 提交于 5月 15, 2019

* support config file, cvm, load, save, shrink
test=develop

* fix error of worker_num & add table.compress_in_save
test=develop

* fix code style
test=develop

* fix save model bug
test=develop

34369944

09 5月, 2019 1 次提交

Reformat fleet API (#17135) · 565d3095

由 tangwei12 提交于 5月 09, 2019

* fix some logic in distributed transpiler, test=develop
* reformat fleet API, test=develop

565d3095

25 4月, 2019 1 次提交
- T
  Fleet unify distributed training (#16791) · 1a4a51db
  由 tangwei12 提交于 4月 25, 2019
```
* implement distributed transpiler with fleet
```
  1a4a51db
11 4月, 2019 1 次提交
- D
  
  fix code style for incubator · ceac9df8
  由 dongdaxiang 提交于 4月 10, 2019
  
  ceac9df8
10 4月, 2019 1 次提交
- X
  add Example in doc string of split_filelist · e784884e
  由 xjqbest 提交于 4月 10, 2019
```
test=develop
```
  e784884e
09 4月, 2019 3 次提交
- X
  fix code style · 1c0ef929
  由 xjqbest 提交于 4月 09, 2019
```
test=develop
```
  1c0ef929
- X
  fix code style · 19381329
  由 xujiaqi01 提交于 4月 09, 2019
```
test=develop
```
  19381329
- X
  move split filelist from trainer.py to fleet & fix error · d5ee580c
  由 xjqbest 提交于 4月 09, 2019
```
test=develop
```
  d5ee580c
05 4月, 2019 1 次提交
- X
  fix init_worker bug · 126d2a2f
  由 xjqbest 提交于 4月 05, 2019
```
test=develop
```
  126d2a2f
04 4月, 2019 2 次提交
- X
  fix code style · 7a759d76
  由 xjqbest 提交于 4月 04, 2019
```
test=develop
```
  7a759d76
- X
  fix runtime error · 5e513928
  由 xjqbest 提交于 4月 04, 2019
```
test=develop
```
  5e513928
30 3月, 2019 1 次提交
- X
  fix client to client communication bug · a99c8d0c
  由 xjqbest 提交于 3月 30, 2019
```
test=develop
```
  a99c8d0c
29 3月, 2019 9 次提交
- X
  fix code style & runtime error · a38b98cb
  由 xjqbest 提交于 3月 26, 2019
```
test=develop
```
  a38b98cb
- D
  
  make role maker and distributed optimizer private · 17790188
  由 dongdaxiang 提交于 3月 26, 2019
  
  17790188
- X
  add doc string · d52586a9
  由 xjqbest 提交于 3月 25, 2019
```
test=develop
```
  d52586a9
- X
  
  fix bug of gen_worker_desc and set_filelist, add some doc · b7940c29
  由 xjqbest 提交于 3月 22, 2019
  
  b7940c29
- X
  
  init model support multi programs · 20b76f3d
  由 xujiaqi01 提交于 3月 20, 2019
  
  20b76f3d
- X
  
  fix runtime error · f5c6a14b
  由 xujiaqi01 提交于 3月 20, 2019
  
  f5c6a14b
- X
  
  support multi dataset && add init model && fix bug · a5b1a0e1
  由 xujiaqi01 提交于 3月 20, 2019
  
  a5b1a0e1
- D
  
  add document for role_maker and fleet parameter, data_generator · 3c65cc1b
  由 dongdaxiang 提交于 3月 19, 2019
  
  3c65cc1b
- D
  
  only allow fleet to be initialized once · a58df687
  由 dongdaxiang 提交于 3月 15, 2019
  
  a58df687

BaiXuePrincess / Paddle 与 Fork 源项目一致

BaiXuePrincess / Paddle
与 Fork 源项目一致