提交 · 2191a08317e0465a7f3db094ecfac269a43f3285 · BaiXuePrincess / Paddle

07 8月, 2020 1 次提交

【paddle.fleet】fleet_util move to paddle.fleet (#25805) · 2191a083

由 123malin 提交于 8月 07, 2020

* test=develop,test=document_fix, remove the out args

* fleet_util move to paddle.fleet
Co-authored-by: NWuHaobo <wuhaobo1994@gmail.com>
Co-authored-by: Ntangwei12 <tangwei12@baidu.com>

2191a083

06 8月, 2020 1 次提交

add heter ps mode (#25682) · 0cb60c70

由 Thunderbrook 提交于 8月 06, 2020

* add heter ps mode

* code style
test=develop

* add with_pslib
test=develop

* unitest
test=develop

* code style
test=develop

* code style
test=develop

* code style
test=develop

* code style
test=develop

* code style
test=develop

* code style
test=develop

* code style
test=develop

* code style
test=develop

* test monitor
test=develop

* prepare trainer
test=develop

* code style
test=develop

0cb60c70

11 7月, 2020 1 次提交

Fix index overflow bug of the CUDA kernel loop increment (#25435) · 0b54d54f

由 Chen Weihang 提交于 7月 11, 2020

* fix softmax_with_cross_entropy cuda kernel overflow bug, test=develop

* replace old macro & for condition, test=develop

* polish details, test=develop

0b54d54f

10 6月, 2020 1 次提交

support CMatchAuc (#24990) · 1c224e26

由 hutuxian 提交于 6月 10, 2020

Support CMatchAucCalculator based on CMatchRankAucCalculator with a new parameter ignore_rank

1c224e26

04 6月, 2020 1 次提交

fix problem in dump and add log (#24891) · b8f17a04

由 hutuxian 提交于 6月 04, 2020

* Fix the field length in LoD scenario
* Fix the missed lod info when copy tensor in dump field
* Add some log to make debug easy

b8f17a04

03 6月, 2020 1 次提交

Replace all errors thrown by LOG(FATAL) with PADDLE_THROW (#24759) · d1062d52

由 Chen Weihang 提交于 6月 03, 2020

* remove REPLACE_ENFORCE_GLOG compile option & add ci rule prohibit LOG(FATAL) using, test=develop

* remove ci test case, test=develop

* replace all LOG(FATAL) & polish message, test=develop

* fix typo, test=develop

* polish error info detail, test=develop

d1062d52

26 5月, 2020 1 次提交
- S
  
  fix conflict, test=develop (#24238) · 95089204
  由 ShenLiang 提交于 5月 26, 2020
  
  95089204
25 5月, 2020 1 次提交
- H
  Support AucRunner in PaddleBox (#22884) · e6b87b31
  由 hutuxian 提交于 5月 25, 2020
```
* Support AucRunner in PaddleBox
* update some code style
```
  e6b87b31
11 5月, 2020 2 次提交

change InitializeGPU to InitializeGPUAndLoadModel (#24377) · 123255cf

由 hutuxian 提交于 5月 11, 2020

* Add InitializeGPUAndLoadModel to solve random hang when downloading sparse parameters.
* Update SaveBase to solve test problem.

123255cf

Add macro BOOST_GET to enrich the error information of boost :: get (#24175) · aa0f254f

由 Chen Weihang 提交于 5月 11, 2020

* add new macro BOOST_GET_SAFELY & unittests, test=develop

* add different macro type, test=develop

* fix get macro type in executor, test=develop

* four macro part change backup

* using one macro for all case, test=develop

* revert attribute change, test=develop

* change to three func to solve gcc4.8 bug, test=develop

* polish some details, test=develop

aa0f254f

30 4月, 2020 1 次提交
- X
  add timeout and http store in communication (#23436) · 1034ca31
  由 xujiaqi01 提交于 4月 30, 2020
```
* add timeout and http store in communication, add revert and confirm in fleet
* test=develop
```
  1034ca31
17 4月, 2020 1 次提交
- H
  
  support set_test_mode and set comlog level(#23905) · df64a966
  由 hutuxian 提交于 4月 17, 2020
  
  df64a966
11 4月, 2020 1 次提交
- X
  add save with prefix (#23449) · d98084e7
  由 xujiaqi01 提交于 4月 11, 2020
```
* add save with prefix
* test=develop
```
  d98084e7
10 4月, 2020 1 次提交
- H
  Add AfsAPI in PaddleBox (#23419) · 94a3789f
  由 hutuxian 提交于 4月 10, 2020
```
* Involves AfsAPI to resolve slow downloading.
* Mainly used in PaddleBox
```
  94a3789f
01 4月, 2020 1 次提交
- X
  add fleet pslib pull and push sparse op and push dense op (#23139) · 3a45767d
  由 xujiaqi01 提交于 4月 01, 2020
```
* add fleet pslib pull and push sparse op and push dense op
* test=develop
```
  3a45767d
26 3月, 2020 2 次提交
- X
  add clear one table (#23089) · 68ea1ad5
  由 xujiaqi01 提交于 3月 26, 2020
```
* add clear_one_table
* test=develop
```
  68ea1ad5
- D
  add MaskAucCalculator in paddlebox (#23157) · ae3bb16d
  由 danleifeng 提交于 3月 26, 2020
```
* add maskauc in paddlebox; test=develop
```
  ae3bb16d
20 3月, 2020 1 次提交
- H
  
  Add need_save_delta parameter to solve OOM (#23097) · 0c30098f
  由 hutuxian 提交于 3月 20, 2020
  
  0c30098f
25 2月, 2020 1 次提交

PaddleBox Framework Part2 (#22466) · 175954d8

由 hutuxian 提交于 2月 25, 2020

* Add two types of Metric Calculator: MultiTaskCalculator & CmatchRankCalculator.
* Add a config for DynamicAdjustChannelNum function to denote whether we will discard the remaining instances when they are not be distributed evenly.
* Remove CPU code in Pull/PushSparse and we will add it back when testing it fully.
* Fix some known issues: such as copying persistable vars after one epoch running.

175954d8

11 2月, 2020 3 次提交

Paddlebox about box_wrapper (#22497) · 1a7962be

由 hutuxian 提交于 2月 11, 2020

Refine PaddleBox Framework, Main functions: 
* Add MetricMsg util class, which can calculate metrics like AUC, bucket_error, COPC.
* Replace FeedPass with new interface: BeginFeedPass & EndFeedPass
* Refactor Pull/Push Sparse Function in box_wrapper.
* Use CUDA Kernel to copy keys and copy feasign between tensor and boxps struct.
* Cache copied keys in pull sparse in order to reuse it in push period.

1a7962be

multi-loss optimization by adding a DownpourOpt worker (#22025) · 2235ee1a

由 yaoxuefeng 提交于 2月 11, 2020

* update

* update test=develop

* update compile set test=develop

* update compile set test=develop

* update test=develop

* update test=develop

* update test=develop

* update compile setting test=develop

* update compile setting test=develop

* update run demo test=develop

* update test=develop

* update test=develop

* fix test=develop

* update test=develop

* update test=develop

* update test=develop

* update test=develop

* update test=develop

* update test=develop

* update test=develop

* update test=develop

* update test=develop

* update format test=develop

* update format test=develop

* update style test=develop

* update style test=develop

* change style test=develop

* change style test=develop

* change style test=develop

* add dataset unittest test=develop

* update test=develop

* update for record test=develop

* udpate style for record test=develop

* update for record test=develop

* update for record test=develop

* update for record test=develop

* fix format test=develop

* update test=develop

* update test=develop

* update test=develop

* update test=develop

* update test=develop

2235ee1a

Compile without nccl deps. [1/2] (#22509) · a90fa540

由 Wilber 提交于 2月 11, 2020

支持不依赖nccl进行编译。[1/2]

多卡下，如果没有打开WITH_NCCL开关编译，多卡不能通信，则只能选择一张卡使用。
Co-authored-by: N石晓伟 <39303645+Shixiaowei02@users.noreply.github.com>

a90fa540

05 2月, 2020 1 次提交

add WITH_NCCL option for cmake. (#22384) · 7bc4b095

由 Wilber 提交于 2月 05, 2020

cmake选项中添加了WITH_NCCL，显示指定是否编译NCCL的部分代码，WITH_NCCL默认打开，但如果WITH_GPU为OFF，则关闭WITH_NCCL

添加了PADDLE_WITH_NCCL定义

单机单卡能够关闭NCCL编译，多卡的话需要默认打开NCCL，如果关闭NCCL，则只能使用单卡
Co-authored-by: N石晓伟 <39303645+Shixiaowei02@users.noreply.github.com>

7bc4b095

02 2月, 2020 1 次提交
- X
  add GeneralRoleMaker (#22295) · 371f377b
  由 xujiaqi01 提交于 2月 02, 2020
```
* add GeneralRoleMaker which is for general usage
* test=develop
```
  371f377b
14 1月, 2020 1 次提交
- X
  add collective communication library in fleet (#22211) · e3a457d3
  由 xujiaqi01 提交于 1月 14, 2020
```
* add collective communication library in fleet to replace mpi
* test=develop
```
  e3a457d3
20 12月, 2019 1 次提交

add table id in cache shuffle (#21585) · c3cf42d0

由 Thunderbrook 提交于 12月 20, 2019

* general table

* add sparse table
test=develop

* no cvm
test=develop

* add no_cvm
test=develop

* add note
test=develop

* code style
test=develop

* code style
test=develop

* code style
test=develop

* code style
test=develop

* code style
test=develop

* add key of optimizer
test=develop

* solve pslib stop core
test=develop

* barrier
test=develop

* add notes
test=develop

* add table id in cache shuffle
test=develop

* table id
test=develop

* code style
test=develop

c3cf42d0

10 12月, 2019 1 次提交
- X
  fix code style of fleet_wrapper (#21639) · c05706fe
  由 xujiaqi01 提交于 12月 10, 2019
```
* fix code style of fleet_wrapper
* test=develop
```
  c05706fe
25 11月, 2019 1 次提交
- T
  print table stat info for pslib (#21296) · 9a7832f8
  由 Thunderbrook 提交于 11月 25, 2019
```
* print table stat
test=develop

* notes
test=develop

* notes
test=develop
```
  9a7832f8
21 11月, 2019 1 次提交

solve pslib core in stop worker (#21263) · 0d17c1b8

由 Thunderbrook 提交于 11月 21, 2019

* general table

* add sparse table
test=develop

* no cvm
test=develop

* add no_cvm
test=develop

* add note
test=develop

* code style
test=develop

* code style
test=develop

* code style
test=develop

* code style
test=develop

* code style
test=develop

* add key of optimizer
test=develop

* solve pslib stop core
test=develop

* barrier
test=develop

* add notes
test=develop

0d17c1b8

20 11月, 2019 1 次提交

support general embedding params (#21217) · 349e82d6

由 Thunderbrook 提交于 11月 20, 2019

* general table

* add sparse table
test=develop

* no cvm
test=develop

* add no_cvm
test=develop

* add note
test=develop

* code style
test=develop

* code style
test=develop

* code style
test=develop

* code style
test=develop

* code style
test=develop

* add key of optimizer
test=develop

349e82d6

15 11月, 2019 2 次提交

X
fix cache table bug, add save_paddle_inference_model, fix hdfs util bug (#21052) · 23876de5
由 xujiaqi01 提交于 11月 15, 2019
```
* fix cache table bug
* add save_paddle_inference_model
* fix hdfs util bug
* test=develop
```
23876de5

add copy table (#21086) · 9e045170

由 xujiaqi01 提交于 11月 15, 2019

* copy some feasigns and corresponding embeddings from one sparse table to another
* copy all feasigns and corresponding embeddings from one sparse table to another
* copy all dense params from one table to another
* copy some local vars to other local vars

9e045170

25 10月, 2019 1 次提交

fix several sparse table issuses (#20686) · 48669aa8

由 xujiaqi01 提交于 10月 25, 2019

* no longer need to define all embedding layers (no one less) of all slots in each program. make trainer_param repeated in ps.proto.
* add find_distributed_lookup_table_grads instead of hard code GRAD
* support embedding stop gradient. push sparse has error before fix this.* 
* fix fill sparse, skip slots which do not have embedding. each slot's embedding in a sparse table should be used in all training programs before fix this.
* fix pull sparse, skip slots which do not have embedding.
* fix collect feasign label info, skip slots which do not have embedding.
* support when there are multi sparse tables in one or multi training programs, each program can pull/push its own related sparse tables instead of all sparse tables.
* test=develop

48669aa8

24 9月, 2019 1 次提交

support change shuffle and train thread num (#19841) · cedc0477

由 xujiaqi01 提交于 9月 24, 2019

* support change shuffle thread num
* support change train thread num
* fix receive shuffle data of each channel
* data norm stop gradient
* add check thread_tensor type and root_tensor type when merge metric
* remove sleep in shuffle, add config
* add config of pslib client to client communication
* fix xbox str
* add data norm op testcase
* add flush in trainer finalize

cedc0477

17 9月, 2019 1 次提交
- X
  support preload thread, optimize hdfs log, fix master+patch bug (#19695) · 6bf298bf
  由 xujiaqi01 提交于 9月 17, 2019
```
* support preload thread
* sleep before fleet wrapper exit for pslib core dump
* optimize hdfs log
* fix master+patch bug
```
  6bf298bf
31 8月, 2019 1 次提交

Paddlebox Framework (#18982) · c756b5d2

由 hutuxian 提交于 8月 31, 2019

* Support looking up embeddings from BoxPS.
* Add a _pull_box_sparse op, for now this op is not exposed to users.
* Add a BoxHelper class, providing 'BeginPass', 'EndPass', 'FeedPass' functions and so on.
* Add 'BoxPSDataset' in python code.
* Add a compile options WITH_BOX_PS and a MACRO PADDLE_WITH_BOX_PS.
* Add UT.
* More concrete information pls refer to: https://github.com/PaddlePaddle/Paddle/pull/18982

c756b5d2

14 8月, 2019 1 次提交

add get_last_save_xbox_base/get_last_save_xbox (#19122) · b104ea06

由 jiaqi 提交于 8月 14, 2019

* add get_last_save_xbox_base/get_last_save_xbox
* fix fleet_util bug of load paddle model
* add doc string in fleet api

b104ea06

11 8月, 2019 1 次提交

add save cache model api in fleet& add slots shuffle in dataset module & add... · 9150cf50

由 yaoxuefeng 提交于 8月 11, 2019

add save cache model api in fleet& add slots shuffle in dataset module & add metric op to calculate ctr related metrics (#18871)

* add ctr related metric layer test=develop

* add save cache and slots shuffle test=develop

* add save cache and slots shuffle test=develop

* fix error

* fix error

* fix style for ci

* fix for comments

* change SlotsShuffle input to std::strinf for generality

* fix style

* fix style

* fix style

* fix style

* fix style

* fix style

* fix stylr

* fix style

* fix style

* fix style

* fix style

* fix style

* fix style

* fix style

* fix style

* fix style

* fix style

* fix style

* fix style

* fix style

* change non-const reference to pointer

* fix style

* fix style

* fix style test=develop

* fix style  test=develop

* add return ins num in ctr metric op

* change dtype to float in metric_op.py

* fix error test=develop

* fix style test=develop

* fix API spec

* fix API spec

* fix API spec test=develop

* add UT test=develop

9150cf50

29 7月, 2019 1 次提交

add clear_model interface in fleetwrapper (#18815) · 52c1431e

由 Thunderbrook 提交于 7月 29, 2019

* dump slot

* test

* proto

* dump slot

* test

* proto

* code style

* code style

* code style

* style

* add delete after unseen days

* add unseen days

* code style

* conflict solve
test=develop

* add clear model

* code style
test=develop

* code style
test=develop

52c1431e

25 7月, 2019 1 次提交

Fix shrink-dense and add scale-datanorm (#18746) · c167a4b4

由 fuyinno4 提交于 7月 25, 2019

Fix FleetWrapper:
1. fix shrink dense: just scale show
2. add datanorm scale: divide datanorm's gradient by batch_size

c167a4b4

BaiXuePrincess / Paddle 与 Fork 源项目一致

BaiXuePrincess / Paddle
与 Fork 源项目一致