提交 · 349e82d66936d007c616e4cba1394b92a29023b9 · BaiXuePrincess / Paddle

20 11月, 2019 1 次提交

support general embedding params (#21217) · 349e82d6

由 Thunderbrook 提交于 11月 20, 2019

* general table

* add sparse table
test=develop

* no cvm
test=develop

* add no_cvm
test=develop

* add note
test=develop

* code style
test=develop

* code style
test=develop

* code style
test=develop

* code style
test=develop

* code style
test=develop

* add key of optimizer
test=develop

349e82d6

15 11月, 2019 2 次提交

X
fix cache table bug, add save_paddle_inference_model, fix hdfs util bug (#21052) · 23876de5
由 xujiaqi01 提交于 11月 15, 2019
```
* fix cache table bug
* add save_paddle_inference_model
* fix hdfs util bug
* test=develop
```
23876de5

add copy table (#21086) · 9e045170

由 xujiaqi01 提交于 11月 15, 2019

* copy some feasigns and corresponding embeddings from one sparse table to another
* copy all feasigns and corresponding embeddings from one sparse table to another
* copy all dense params from one table to another
* copy some local vars to other local vars

9e045170

04 11月, 2019 1 次提交
- T
  find lookup table in order (#20932) · 5970e8ac
  由 Thunderbrook 提交于 11月 04, 2019
```
test=develop
```
  5970e8ac
31 10月, 2019 2 次提交

Fix Paddle Cloud role maker (#20860) · 16596f64

由 Chengmo 提交于 10月 31, 2019

* fix PaddleCloud Role maker & add warning in distribute transpiler  & change rpc_retry_times

16596f64

support dump param of model into afs (#20302) · 59bcdc8a

由 Thunderbrook 提交于 10月 31, 2019

* support dump param to afs
test=develop

* code style
test=develop

* code style
test=develop

* dump param
test=develop

* dump param
test=develop

* dump param
test=develop

* dump param
test=develop

59bcdc8a

25 10月, 2019 1 次提交

fix several sparse table issuses (#20686) · 48669aa8

由 xujiaqi01 提交于 10月 25, 2019

* no longer need to define all embedding layers (no one less) of all slots in each program. make trainer_param repeated in ps.proto.
* add find_distributed_lookup_table_grads instead of hard code GRAD
* support embedding stop gradient. push sparse has error before fix this.* 
* fix fill sparse, skip slots which do not have embedding. each slot's embedding in a sparse table should be used in all training programs before fix this.
* fix pull sparse, skip slots which do not have embedding.
* fix collect feasign label info, skip slots which do not have embedding.
* support when there are multi sparse tables in one or multi training programs, each program can pull/push its own related sparse tables instead of all sparse tables.
* test=develop

48669aa8

18 10月, 2019 1 次提交
- X
  add check nan / inf in downpour worker (#20694) · 5223b0dd
  由 xujiaqi01 提交于 10月 18, 2019
```
* add check nan / inf in downpour worker during training
* test=develop
```
  5223b0dd
15 10月, 2019 1 次提交

Fix communicator slow bug & fix communicator stop bug (#20366) · 940c6ff1

由 Chengmo 提交于 10月 15, 2019

* test=develop,Fix communicator slow bug

* test=develop, delete if() in stop_worker()

* test=develop

* fix UT, test=develop

* fix bug in fetch handler, test=develop

* fix bug in fetch handler, test=develop

* test=develop, fix fetch barrier bug

* test=develop, bug fix

* test=develop, bug fix

* test=develop, fix bug

940c6ff1

14 10月, 2019 1 次提交
- T
  dump fix dov vec file num (#20539) · f76a32df
  由 Thunderbrook 提交于 10月 14, 2019
```
* support dump multi file
test=develop

* dump fix num file
test=develop
```
  f76a32df
12 10月, 2019 1 次提交
- Z
  
  fix converter , test=develop (#20522) · b5219920
  由 zhang wenhui 提交于 10月 12, 2019
  
  b5219920
11 10月, 2019 1 次提交
- Z
  fix pslib datanorm double bug (#20297) · b82e6520
  由 zhang wenhui 提交于 10月 11, 2019
```
* fix fc sort . test=develop
```
  b82e6520
07 10月, 2019 1 次提交
- Z
  
  fix fleet_desc delete_after_unseen_day bug in node.py (#20091) · b28d4a82
  由 zhang wenhui 提交于 10月 07, 2019
  
  b28d4a82
30 9月, 2019 1 次提交
- C
  Add GEO-SGD distribute training algorithm (#20018) · 728ec1b4
  由 Chengmo 提交于 9月 30, 2019
```
* refector geo sgd & communicator
```
  728ec1b4
24 9月, 2019 1 次提交

support change shuffle and train thread num (#19841) · cedc0477

由 xujiaqi01 提交于 9月 24, 2019

* support change shuffle thread num
* support change train thread num
* fix receive shuffle data of each channel
* data norm stop gradient
* add check thread_tensor type and root_tensor type when merge metric
* remove sleep in shuffle, add config
* add config of pslib client to client communication
* fix xbox str
* add data norm op testcase
* add flush in trainer finalize

cedc0477

06 9月, 2019 1 次提交
- 1
  Optimize fleet API: add input check for some interfaces (#18971) · a25a716e
  由 123malin 提交于 9月 06, 2019
```
* fleet api add input check, test=develop
```
  a25a716e
30 8月, 2019 1 次提交

add thread scope stat accurate metrics test=develop (#19480) · 10ca3f96

由 yaoxuefeng 提交于 8月 30, 2019

* add thread scope stat accurate metrics test=develop

* fix style

* fix style

* fix style

* fix style test=develop

* fix style test=develop

* fix style test=develop

* fix style test=develop

* fix style test=develop

* fix style test=develop

* fix style test=develop

* fix conflict

* fix style

* fix style test=develop

* fix error test=develop

* fix error test=develop

10ca3f96

29 8月, 2019 2 次提交

support debug each output of each ins (#19004) · 1fe468d3

由 Thunderbrook 提交于 8月 29, 2019

* dump slot

* test

* proto

* dump slot

* test

* proto

* code style

* code style

* code style

* style

* add delete after unseen days

* add unseen days

* code style

* conflict solve
test=develop

* add clear model

* code style
test=develop

* code style
test=develop

* support debug tensor of each ins
test=develop

* support debug tensor of each ins
test=develop

* learning rate

* code style

* code style

* code style

* code style

* code style

* code style

* code style

* code style

* code style

* code style

* code style

* code style

* code style
test=develop

* code style
test=develop

* unitest

* style

* style

* multi phase

* add channel

* code style

* style

* style

* unitest

* style

* define

* define
test=develop

* style
test=develop

* rm define
test=develop

* linux

* linux
test=develop

* style
test=develop

* output format
test=develop

* windows ci
test=develop

1fe468d3

Z
support fc sort by number, test=develop (#19466) · bd35a7f0
由 zhang wenhui 提交于 8月 29, 2019
```
fleet_desc sort fc name by dictionary sort, but we want to sort by number.
```
bd35a7f0

28 8月, 2019 1 次提交

Fix the correctness of async mode at distributed training (#18863) · 65c73684

由 tangwei12 提交于 8月 28, 2019

* fix correctness of the communicator

* fix a bug in send thread when sending var context is empty, test=develop

* add lookup_table_prefetch_op and prefetch optimize, test=develop

* remove remote prefetch GPU supported

* word2vec force with CPU, test=develop

* test dist remote lookup table force with CPU, test=develop

65c73684

27 8月, 2019 1 次提交
- Z
  fix fleet_desc bug && support format for abacus hotstart (#19430) · 0d794983
  由 zhang wenhui 提交于 8月 27, 2019
```
fix fleet_desc dense_table unsort bug ，not  support format for abacus hotstart yet.
```
  0d794983
23 8月, 2019 1 次提交
- Z
  add fleet_desc config feature & multi_sparse table, test=develop (#18827) · 4a3c4b8f
  由 zhang wenhui 提交于 8月 23, 2019
```
 add fleet_desc config feature & multi_sparse table,
```
  4a3c4b8f
16 8月, 2019 1 次提交
- G
  Remove node_num function. (#19167) · 86f05911
  由 gongweibao 提交于 8月 16, 2019
```
node_num is not needed for users, so remove them and fix the bugs about it!
```
  86f05911
14 8月, 2019 3 次提交

J
fix default value (#19193) · b86be13c
由 jiaqi 提交于 8月 14, 2019
```
* fix default value in ps_pb2.py:   delta_keep_days 30 -> 16
* test=develop
```
b86be13c

add get_last_save_xbox_base/get_last_save_xbox (#19122) · b104ea06

由 jiaqi 提交于 8月 14, 2019

* add get_last_save_xbox_base/get_last_save_xbox
* fix fleet_util bug of load paddle model
* add doc string in fleet api

b104ea06

fix default value of fleet desc (#19176) · bfd514c7

由 jiaqi 提交于 8月 14, 2019

* fix default value of fleet desc, default values are same with jingpai
* print log when save model

bfd514c7

12 8月, 2019 1 次提交
- G
  Polish fleet API to support cuda collective mode and nccl2 mode. (#18966) · 29d87812
  由 gongweibao 提交于 8月 12, 2019
```
Polish fleet API to support cuda collective mode and nccl2 mode
```
  29d87812
11 8月, 2019 1 次提交

add save cache model api in fleet& add slots shuffle in dataset module & add... · 9150cf50

由 yaoxuefeng 提交于 8月 11, 2019

add save cache model api in fleet& add slots shuffle in dataset module & add metric op to calculate ctr related metrics (#18871)

* add ctr related metric layer test=develop

* add save cache and slots shuffle test=develop

* add save cache and slots shuffle test=develop

* fix error

* fix error

* fix style for ci

* fix for comments

* change SlotsShuffle input to std::strinf for generality

* fix style

* fix style

* fix style

* fix style

* fix style

* fix style

* fix stylr

* fix style

* fix style

* fix style

* fix style

* fix style

* fix style

* fix style

* fix style

* fix style

* fix style

* fix style

* fix style

* fix style

* change non-const reference to pointer

* fix style

* fix style

* fix style test=develop

* fix style  test=develop

* add return ins num in ctr metric op

* change dtype to float in metric_op.py

* fix error test=develop

* fix style test=develop

* fix API spec

* fix API spec

* fix API spec test=develop

* add UT test=develop

9150cf50

01 8月, 2019 1 次提交
- J
  adjust ins weight according to nid slot (#18784) · 768059b3
  由 jiaqi 提交于 8月 01, 2019
```
adjust ins weight according to nid slot , user can specify adjust_ins_weight in strategy
```
  768059b3
31 7月, 2019 1 次提交

set fleet_send_batch_num a default value according to trainer num · 233746d8

由 jiaqi 提交于 7月 31, 2019

(1) set fleet_send_batch_num a default value according to trainer num， the previous 80000 is fixed，if trainer num is much less or larger than 100，global shuffle may have timeout error.

(2) fix load one table bug, add barrier

233746d8

29 7月, 2019 1 次提交

add clear_model interface in fleetwrapper (#18815) · 52c1431e

由 Thunderbrook 提交于 7月 29, 2019

* dump slot

* test

* proto

* dump slot

* test

* proto

* code style

* code style

* code style

* style

* add delete after unseen days

* add unseen days

* code style

* conflict solve
test=develop

* add clear model

* code style
test=develop

* code style
test=develop

52c1431e

25 7月, 2019 1 次提交

Fix shrink-dense and add scale-datanorm (#18746) · c167a4b4

由 fuyinno4 提交于 7月 25, 2019

Fix FleetWrapper:
1. fix shrink dense: just scale show
2. add datanorm scale: divide datanorm's gradient by batch_size

c167a4b4

24 7月, 2019 1 次提交

add slot to sparse table (#18686) · d8396281

由 Thunderbrook 提交于 7月 24, 2019

The change includes 2 things:

1. save delta model and shrink table are control by the same parameter before, now add delete_after_unseen_days to control shrink table.
2. value in sparse table has no slot before, now add slot in sparse table, and add DownpureCtrAccessor to support the new meta.
test=develop

d8396281

23 7月, 2019 1 次提交

support patch data, add load_one_table, fix bug (#18509) · d18aabb4

由 jiaqi 提交于 7月 23, 2019

（1）support patch data （merge slots of instances of same line id, modify dense layer which
changes its size）
（2）add fleet load_one_table interface, support load from paddle model and load from pslib model
（3）fix push sparse bug which cause push sparse cost more time（about 10% in my testcase）
（4）when some slots are not in one of your network (join/update, etc.)，data feed、collect label info、push/pull sparse will skip these slots， instead of throw error.
（5）add more debug info in TrainFilesWithProfiler

d18aabb4

02 7月, 2019 1 次提交
- G
  make fleet support mpi job submit directly (#18441) · 357311fd
  由 guru4elephant 提交于 7月 02, 2019
```
make fleet support mpi job submit directly.
```
  357311fd
27 6月, 2019 1 次提交
- T
  fix communicator with pyreader (#18350) · 999d9a59
  由 tangwei12 提交于 6月 27, 2019
```
* add is_runnning in communicator, test=develop
```
  999d9a59
13 6月, 2019 1 次提交
- T
  
  fix bug in fleet, test=develop (#18058) · 4c735f24
  由 tangwei12 提交于 6月 13, 2019
  
  4c735f24
12 6月, 2019 1 次提交
- T
  fix save/load in fleet (#17675) · 101f74cb
  由 tangwei12 提交于 6月 12, 2019
```
* fix save/load in Fleet
* add UT framework of Fleet
```
  101f74cb
23 5月, 2019 1 次提交
- Q
  Async exe support communicator (#17386) · 58f7695a
  由 Qiao Longfei 提交于 5月 23, 2019
```
Async exe support communicator
```
  58f7695a
17 5月, 2019 1 次提交
- J
  support sparse table get shard_num from TableParameter (#17443) · 05df39ac
  由 jiaqi 提交于 5月 17, 2019
```
test=develop
```
  05df39ac

BaiXuePrincess / Paddle 与 Fork 源项目一致

BaiXuePrincess / Paddle
与 Fork 源项目一致