提交 · 99626502f747cc85d518d87267cec821ffbf69a3 · 机器未来 / Paddle

18 9月, 2020 1 次提交

【paddle.fleet】gloo and util (#27213) · 99626502

由 tangwei12 提交于 9月 18, 2020

* fix worker endpoints

* fix gloo wrapper for hdfs

* GPU fleetrun support gloo

* parameterserver fleetrun support gloo

* fix get server endpoint

99626502

14 9月, 2020 1 次提交
- C
  
  polish framework error message part 8 (#27269) · 79149c8e
  由 Chen Weihang 提交于 9月 14, 2020
  
  79149c8e
02 9月, 2020 1 次提交
- T
  fix eigen in push sparse; fix hadoop command (#26872) · 52057484
  由 Thunderbrook 提交于 9月 02, 2020
```
* fix eigen in push sparse; fix hadoop command
test=develop

* add log in load_combine_op
test=develop
```
  52057484
31 8月, 2020 1 次提交
- Y
  
  fleet add save with whitelist test=develop (#23376) · a47d92d8
  由 yaoxuefeng 提交于 8月 31, 2020
  
  a47d92d8
27 8月, 2020 1 次提交
- L
  [api 2.0] add collective op for cpu using gloo and paddle.distributed.* apis (#26552) · 1c681383
  由 lilong12 提交于 8月 27, 2020
```
add collective op for cpu using gloo and paddle.distributed.* apis
```
  1c681383
18 8月, 2020 1 次提交
- T
  fix heter proto (#26093) · a83e0f26
  由 Thunderbrook 提交于 8月 18, 2020
```
test=develop
```
  a83e0f26
07 8月, 2020 2 次提交

T
fix compile error with mkl (#26030) · fd2947ba
由 Thunderbrook 提交于 8月 07, 2020
```
test=develop
```
fd2947ba

【paddle.fleet】fleet_util move to paddle.fleet (#25805) · 2191a083

由 123malin 提交于 8月 07, 2020

* test=develop,test=document_fix, remove the out args

* fleet_util move to paddle.fleet
Co-authored-by: NWuHaobo <wuhaobo1994@gmail.com>
Co-authored-by: Ntangwei12 <tangwei12@baidu.com>

2191a083

06 8月, 2020 1 次提交

add heter ps mode (#25682) · 0cb60c70

由 Thunderbrook 提交于 8月 06, 2020

* add heter ps mode

* code style
test=develop

* add with_pslib
test=develop

* unitest
test=develop

* code style
test=develop

* code style
test=develop

* code style
test=develop

* code style
test=develop

* code style
test=develop

* code style
test=develop

* code style
test=develop

* code style
test=develop

* test monitor
test=develop

* prepare trainer
test=develop

* code style
test=develop

0cb60c70

11 7月, 2020 1 次提交

Fix index overflow bug of the CUDA kernel loop increment (#25435) · 0b54d54f

由 Chen Weihang 提交于 7月 11, 2020

* fix softmax_with_cross_entropy cuda kernel overflow bug, test=develop

* replace old macro & for condition, test=develop

* polish details, test=develop

0b54d54f

10 6月, 2020 1 次提交

support CMatchAuc (#24990) · 1c224e26

由 hutuxian 提交于 6月 10, 2020

Support CMatchAucCalculator based on CMatchRankAucCalculator with a new parameter ignore_rank

1c224e26

04 6月, 2020 1 次提交

fix problem in dump and add log (#24891) · b8f17a04

由 hutuxian 提交于 6月 04, 2020

* Fix the field length in LoD scenario
* Fix the missed lod info when copy tensor in dump field
* Add some log to make debug easy

b8f17a04

03 6月, 2020 1 次提交

Replace all errors thrown by LOG(FATAL) with PADDLE_THROW (#24759) · d1062d52

由 Chen Weihang 提交于 6月 03, 2020

* remove REPLACE_ENFORCE_GLOG compile option & add ci rule prohibit LOG(FATAL) using, test=develop

* remove ci test case, test=develop

* replace all LOG(FATAL) & polish message, test=develop

* fix typo, test=develop

* polish error info detail, test=develop

d1062d52

26 5月, 2020 1 次提交
- S
  
  fix conflict, test=develop (#24238) · 95089204
  由 ShenLiang 提交于 5月 26, 2020
  
  95089204
25 5月, 2020 1 次提交
- H
  Support AucRunner in PaddleBox (#22884) · e6b87b31
  由 hutuxian 提交于 5月 25, 2020
```
* Support AucRunner in PaddleBox
* update some code style
```
  e6b87b31
11 5月, 2020 2 次提交

change InitializeGPU to InitializeGPUAndLoadModel (#24377) · 123255cf

由 hutuxian 提交于 5月 11, 2020

* Add InitializeGPUAndLoadModel to solve random hang when downloading sparse parameters.
* Update SaveBase to solve test problem.

123255cf

Add macro BOOST_GET to enrich the error information of boost :: get (#24175) · aa0f254f

由 Chen Weihang 提交于 5月 11, 2020

* add new macro BOOST_GET_SAFELY & unittests, test=develop

* add different macro type, test=develop

* fix get macro type in executor, test=develop

* four macro part change backup

* using one macro for all case, test=develop

* revert attribute change, test=develop

* change to three func to solve gcc4.8 bug, test=develop

* polish some details, test=develop

aa0f254f

30 4月, 2020 1 次提交
- X
  add timeout and http store in communication (#23436) · 1034ca31
  由 xujiaqi01 提交于 4月 30, 2020
```
* add timeout and http store in communication, add revert and confirm in fleet
* test=develop
```
  1034ca31
17 4月, 2020 1 次提交
- H
  
  support set_test_mode and set comlog level(#23905) · df64a966
  由 hutuxian 提交于 4月 17, 2020
  
  df64a966
11 4月, 2020 1 次提交
- X
  add save with prefix (#23449) · d98084e7
  由 xujiaqi01 提交于 4月 11, 2020
```
* add save with prefix
* test=develop
```
  d98084e7
10 4月, 2020 1 次提交
- H
  Add AfsAPI in PaddleBox (#23419) · 94a3789f
  由 hutuxian 提交于 4月 10, 2020
```
* Involves AfsAPI to resolve slow downloading.
* Mainly used in PaddleBox
```
  94a3789f
01 4月, 2020 1 次提交
- X
  add fleet pslib pull and push sparse op and push dense op (#23139) · 3a45767d
  由 xujiaqi01 提交于 4月 01, 2020
```
* add fleet pslib pull and push sparse op and push dense op
* test=develop
```
  3a45767d
26 3月, 2020 2 次提交
- X
  add clear one table (#23089) · 68ea1ad5
  由 xujiaqi01 提交于 3月 26, 2020
```
* add clear_one_table
* test=develop
```
  68ea1ad5
- D
  add MaskAucCalculator in paddlebox (#23157) · ae3bb16d
  由 danleifeng 提交于 3月 26, 2020
```
* add maskauc in paddlebox; test=develop
```
  ae3bb16d
20 3月, 2020 1 次提交
- H
  
  Add need_save_delta parameter to solve OOM (#23097) · 0c30098f
  由 hutuxian 提交于 3月 20, 2020
  
  0c30098f
25 2月, 2020 1 次提交

PaddleBox Framework Part2 (#22466) · 175954d8

由 hutuxian 提交于 2月 25, 2020

* Add two types of Metric Calculator: MultiTaskCalculator & CmatchRankCalculator.
* Add a config for DynamicAdjustChannelNum function to denote whether we will discard the remaining instances when they are not be distributed evenly.
* Remove CPU code in Pull/PushSparse and we will add it back when testing it fully.
* Fix some known issues: such as copying persistable vars after one epoch running.

175954d8

11 2月, 2020 3 次提交

Paddlebox about box_wrapper (#22497) · 1a7962be

由 hutuxian 提交于 2月 11, 2020

Refine PaddleBox Framework, Main functions: 
* Add MetricMsg util class, which can calculate metrics like AUC, bucket_error, COPC.
* Replace FeedPass with new interface: BeginFeedPass & EndFeedPass
* Refactor Pull/Push Sparse Function in box_wrapper.
* Use CUDA Kernel to copy keys and copy feasign between tensor and boxps struct.
* Cache copied keys in pull sparse in order to reuse it in push period.

1a7962be

multi-loss optimization by adding a DownpourOpt worker (#22025) · 2235ee1a

由 yaoxuefeng 提交于 2月 11, 2020

* update

* update test=develop

* update compile set test=develop

* update compile set test=develop

* update test=develop

* update test=develop

* update test=develop

* update compile setting test=develop

* update compile setting test=develop

* update run demo test=develop

* update test=develop

* update test=develop

* fix test=develop

* update test=develop

* update test=develop

* update test=develop

* update test=develop

* update test=develop

* update test=develop

* update test=develop

* update test=develop

* update test=develop

* update format test=develop

* update format test=develop

* update style test=develop

* update style test=develop

* change style test=develop

* change style test=develop

* change style test=develop

* add dataset unittest test=develop

* update test=develop

* update for record test=develop

* udpate style for record test=develop

* update for record test=develop

* update for record test=develop

* update for record test=develop

* fix format test=develop

* update test=develop

* update test=develop

* update test=develop

* update test=develop

* update test=develop

2235ee1a

Compile without nccl deps. [1/2] (#22509) · a90fa540

由 Wilber 提交于 2月 11, 2020

支持不依赖nccl进行编译。[1/2]

多卡下，如果没有打开WITH_NCCL开关编译，多卡不能通信，则只能选择一张卡使用。
Co-authored-by: N石晓伟 <39303645+Shixiaowei02@users.noreply.github.com>

a90fa540

05 2月, 2020 1 次提交

add WITH_NCCL option for cmake. (#22384) · 7bc4b095

由 Wilber 提交于 2月 05, 2020

cmake选项中添加了WITH_NCCL，显示指定是否编译NCCL的部分代码，WITH_NCCL默认打开，但如果WITH_GPU为OFF，则关闭WITH_NCCL

添加了PADDLE_WITH_NCCL定义

单机单卡能够关闭NCCL编译，多卡的话需要默认打开NCCL，如果关闭NCCL，则只能使用单卡
Co-authored-by: N石晓伟 <39303645+Shixiaowei02@users.noreply.github.com>

7bc4b095

02 2月, 2020 1 次提交
- X
  add GeneralRoleMaker (#22295) · 371f377b
  由 xujiaqi01 提交于 2月 02, 2020
```
* add GeneralRoleMaker which is for general usage
* test=develop
```
  371f377b
14 1月, 2020 1 次提交
- X
  add collective communication library in fleet (#22211) · e3a457d3
  由 xujiaqi01 提交于 1月 14, 2020
```
* add collective communication library in fleet to replace mpi
* test=develop
```
  e3a457d3
20 12月, 2019 1 次提交

add table id in cache shuffle (#21585) · c3cf42d0

由 Thunderbrook 提交于 12月 20, 2019

* general table

* add sparse table
test=develop

* no cvm
test=develop

* add no_cvm
test=develop

* add note
test=develop

* code style
test=develop

* code style
test=develop

* code style
test=develop

* code style
test=develop

* code style
test=develop

* add key of optimizer
test=develop

* solve pslib stop core
test=develop

* barrier
test=develop

* add notes
test=develop

* add table id in cache shuffle
test=develop

* table id
test=develop

* code style
test=develop

c3cf42d0

10 12月, 2019 1 次提交
- X
  fix code style of fleet_wrapper (#21639) · c05706fe
  由 xujiaqi01 提交于 12月 10, 2019
```
* fix code style of fleet_wrapper
* test=develop
```
  c05706fe
25 11月, 2019 1 次提交
- T
  print table stat info for pslib (#21296) · 9a7832f8
  由 Thunderbrook 提交于 11月 25, 2019
```
* print table stat
test=develop

* notes
test=develop

* notes
test=develop
```
  9a7832f8
21 11月, 2019 1 次提交

solve pslib core in stop worker (#21263) · 0d17c1b8

由 Thunderbrook 提交于 11月 21, 2019

* general table

* add sparse table
test=develop

* no cvm
test=develop

* add no_cvm
test=develop

* add note
test=develop

* code style
test=develop

* code style
test=develop

* code style
test=develop

* code style
test=develop

* code style
test=develop

* add key of optimizer
test=develop

* solve pslib stop core
test=develop

* barrier
test=develop

* add notes
test=develop

0d17c1b8

20 11月, 2019 1 次提交

support general embedding params (#21217) · 349e82d6

由 Thunderbrook 提交于 11月 20, 2019

* general table

* add sparse table
test=develop

* no cvm
test=develop

* add no_cvm
test=develop

* add note
test=develop

* code style
test=develop

* code style
test=develop

* code style
test=develop

* code style
test=develop

* code style
test=develop

* add key of optimizer
test=develop

349e82d6

15 11月, 2019 2 次提交

X
fix cache table bug, add save_paddle_inference_model, fix hdfs util bug (#21052) · 23876de5
由 xujiaqi01 提交于 11月 15, 2019
```
* fix cache table bug
* add save_paddle_inference_model
* fix hdfs util bug
* test=develop
```
23876de5

add copy table (#21086) · 9e045170

由 xujiaqi01 提交于 11月 15, 2019

* copy some feasigns and corresponding embeddings from one sparse table to another
* copy all feasigns and corresponding embeddings from one sparse table to another
* copy all dense params from one table to another
* copy some local vars to other local vars

9e045170

25 10月, 2019 1 次提交

fix several sparse table issuses (#20686) · 48669aa8

由 xujiaqi01 提交于 10月 25, 2019

* no longer need to define all embedding layers (no one less) of all slots in each program. make trainer_param repeated in ps.proto.
* add find_distributed_lookup_table_grads instead of hard code GRAD
* support embedding stop gradient. push sparse has error before fix this.* 
* fix fill sparse, skip slots which do not have embedding. each slot's embedding in a sparse table should be used in all training programs before fix this.
* fix pull sparse, skip slots which do not have embedding.
* fix collect feasign label info, skip slots which do not have embedding.
* support when there are multi sparse tables in one or multi training programs, each program can pull/push its own related sparse tables instead of all sparse tables.
* test=develop

48669aa8

机器未来 / Paddle 与 Fork 源项目一致

机器未来 / Paddle
与 Fork 源项目一致