提交 · 8acd745c25ba2fedc1f6c5bb48f13a2de94a58f1 · PaddlePaddle / Paddle

13 2月, 2020 6 次提交
- Z
  [Ernie GPU Optim]: Fuse three fc to multihtead matmul (#22486) · 8acd745c
  由 Zhaolong Xing 提交于 2月 13, 2020
```
* 1. optim multihead matmul: fuse three fc to multihtead matmul

test=develop

* fix conflict
test=develop

* fix comments
test=develop
```
  8acd745c
- H
  Add Static Analysis to Construct AstNodeWrapper (#22569) · a8dd425a
  由 Huihuang Zheng 提交于 2月 13, 2020
```
As the title
```
  a8dd425a
- J
  Add test with reused requantize op (#22482) · 146ed409
  由 joanna.wozna.intel 提交于 2月 13, 2020
```
test=develop
```
  146ed409
- Y
  Disable fusion_group for windows and mac in build_strategy. (#22549) · 96770f51
  由 Yiqun Liu 提交于 2月 13, 2020
```
test=develop
```
  96770f51
- 石
  
  update internal header files, test=develop (#22379) · 53be3f07
  由石晓伟提交于 2月 13, 2020
  
  53be3f07
- Z
  
  fix traced layer with non persistable vars, test=develop (#22552) · 08033c86
  由 Zeng Jinle 提交于 2月 12, 2020
  
  08033c86
12 2月, 2020 5 次提交

Add support for dynamic_decode(while) training. (#22231) · 31b54646

由 Guo Sheng 提交于 2月 12, 2020

* Add support for dynamic_decode(while) training. test=develop

* Fix assign_op and tensor_array_read_write_op after solving conflict. test=develop

* Fix test_rnn_decode_api.py. test=develop

* Refine docs for apis in rnn.py. test=develop

* Adjust outputs of dynamic_decode. test=develop

* Remove the force_cpu update in assign_op. test=develop

* Remove the force_cpu update in assign_op. test=develop

* Make RNNCell.get_initial_states support batch_dim_idx argument. test=develop

* Rename _create_array_outof_while as _create_array_out_of_while in rnn.py.
test=develop

31b54646

T
fix bug with compiledProgram (#22495) · b0675c81
由 tangwei12 提交于 2月 12, 2020
```
* add thread barrier for the compiled program
```
b0675c81

Add support for Ernie NLP model to the Slim QAT (#22506) · 4cddb43c

由 Wojciech Uss 提交于 2月 12, 2020

* a test for Ernie QAT INT8 accuracy check

test=develop

* Remove NLP comparison test to split PRs

test=develop

* Fix typo and tabs, delete commented lines

test=develop

* re-combine the 2 PRs, test=develop
Co-authored-by: NMichał Gallus <sand3r@interia.eu>
Co-authored-by: Nbingyanghuang <33643817+bingyanghuang@users.noreply.github.com>

4cddb43c

P

remove copying trt to inference lib, test=develop (#22470) · 5a1a9a1e
由 Pei Yang 提交于 2月 12, 2020

5a1a9a1e

support slice double grad, test=develop (#22166) · 58d99247

由 Double_V 提交于 2月 12, 2020

* support slice double grad, test=develop
* merge two doublegradopmaker to one doublegradopmaker,test=develop
* change the shape of slice_OP's unittest, test=develop

58d99247

11 2月, 2020 8 次提交

Paddlebox about box_wrapper (#22497) · 1a7962be

由 hutuxian 提交于 2月 11, 2020

Refine PaddleBox Framework, Main functions: 
* Add MetricMsg util class, which can calculate metrics like AUC, bucket_error, COPC.
* Replace FeedPass with new interface: BeginFeedPass & EndFeedPass
* Refactor Pull/Push Sparse Function in box_wrapper.
* Use CUDA Kernel to copy keys and copy feasign between tensor and boxps struct.
* Cache copied keys in pull sparse in order to reuse it in push period.

1a7962be

H

【OpPorting Example】DEMO OF FIX COMPILE&RUNTIME LOD_EQUALITY (#22460) · 9e29d3eb
由 huzhiqiang 提交于 2月 11, 2020

9e29d3eb
Fix bilinear import math (#22523) · 00c110f3
由 littletomatodonkey 提交于 2月 11, 2020

00c110f3
W

Add wrong info when use DGC in cpu (#22515) · d69df9bf
由 WangXi 提交于 2月 11, 2020

d69df9bf

multi-loss optimization by adding a DownpourOpt worker (#22025) · 2235ee1a

由 yaoxuefeng 提交于 2月 11, 2020

* update

* update test=develop

* update compile set test=develop

* update compile set test=develop

* update test=develop

* update test=develop

* update test=develop

* update compile setting test=develop

* update compile setting test=develop

* update run demo test=develop

* update test=develop

* update test=develop

* fix test=develop

* update test=develop

* update test=develop

* update test=develop

* update test=develop

* update test=develop

* update test=develop

* update test=develop

* update test=develop

* update test=develop

* update format test=develop

* update format test=develop

* update style test=develop

* update style test=develop

* change style test=develop

* change style test=develop

* change style test=develop

* add dataset unittest test=develop

* update test=develop

* update for record test=develop

* udpate style for record test=develop

* update for record test=develop

* update for record test=develop

* update for record test=develop

* fix format test=develop

* update test=develop

* update test=develop

* update test=develop

* update test=develop

* update test=develop

2235ee1a

Improve transpose performance with tile sm copy, test=develop (#22311) · 54970444

由 zhaoyuchen2018 提交于 2月 11, 2020


* Refine code, fix select tile error,test=develop

* Refine element type and some comments, test=develop

* Refine comments and gpu utils, test=develop

* Remove some useless condition

* Refine floor and ceil, test=develop

* refine for loop. test=develop
Signed-off-by: Nzhaoyuchen <zhaoyuchen01@baidu.com>

54970444

Compile without nccl deps. [1/2] (#22509) · a90fa540

由 Wilber 提交于 2月 11, 2020

支持不依赖nccl进行编译。[1/2]

多卡下，如果没有打开WITH_NCCL开关编译，多卡不能通信，则只能选择一张卡使用。
Co-authored-by: N石晓伟 <39303645+Shixiaowei02@users.noreply.github.com>

a90fa540

G
Make assign op support LoDTensorArray and modify while_loop API (#22309) · 3a59a7a1
由 guofei 提交于 2月 11, 2020
```
This PR makes assign op support LoDTensorArray and enable the loop_vars in
while_loop to support tuple or list.
```
3a59a7a1

10 2月, 2020 9 次提交
- Z
  [Refine Paddle-TRT INT8]: Support PaddleSlim's Resnet50, Mobilenetv1, Yolov3... · 54a325a5
  由 Zhaolong Xing 提交于 2月 10, 2020
```
[Refine Paddle-TRT INT8]: Support PaddleSlim's Resnet50, Mobilenetv1, Yolov3 models for Inference. (#22483)

* add int8 op teller for trt.

* refine trt int8

* add int8 op teller for trt.
test=develop
```
  54a325a5
- L
  Implement InferencePassTest for testing precision of inference passes (#22387) · 14b6133b
  由 liu zhengxi 提交于 2月 10, 2020
```
* add InterencePassTest for testing precision of inference passes, test=develop
```
  14b6133b
- Z
  add cp27-cp27m-gcc82 and cp27-cp27mu-gcc82 branch to support gcc8.2 compile... · 5739eeb9
  由 zhongpu 提交于 2月 10, 2020
```
add cp27-cp27m-gcc82 and cp27-cp27mu-gcc82 branch to support gcc8.2 compile for paddle, test=develop (#22504)
```
  5739eeb9
- G
  
  Fix the leaving out of rnn_memory_helper_grad's output vars. test=develop (#22499) · e7bbad6c
  由 Guo Sheng 提交于 2月 10, 2020
  
  e7bbad6c
- C
  Post_training_quantization support set quant 8/16 bits (#22492) · d143f70a
  由 cc 提交于 2月 10, 2020
```
* post_training_quantization support set bits, test=develop

* up, test=develop
```
  d143f70a
- W
  Compile without nccl deps. [2/2] (#22484) · de009152
  由 Wilber 提交于 2月 10, 2020
```
Compile without nccl deps. [1/2]
Co-authored-by: N石晓伟 <39303645+Shixiaowei02@users.noreply.github.com>
```
  de009152
- Y
  Fix dismatch of std::max's arguments type on windows. (#22507) · 4b2227e9
  由 Yiqun Liu 提交于 2月 10, 2020
```
test=develop
```
  4b2227e9
- W
  
  fix test_fusion_seqpool_concat lod level between compile and runtime (#22488) · 870f4658
  由 Wilber 提交于 2月 10, 2020
  
  870f4658
- H
  Add Simple Framework for Transforming Dygraph to Static Graph (#22491) · 903039a3
  由 Huihuang Zheng 提交于 2月 10, 2020
```
This PR provides very basic and simple framework for transforming Dygraph to Static Graph.

API names, final outputs are not determined yet. Feel free to modify or add class/function/type when you think the framework is not extendable for you.
```
  903039a3
07 2月, 2020 6 次提交

Fix the integer overflow problem of sequence2batch (#22479) · a61d0952

由 Zhong Hui 提交于 2月 07, 2020

Fix the  integer overflow problem in the op of sequence2batch, change the int32_t to size_t，
In the /paddle/fluid/operators/math/sequence2batch.h#L122.

a61d0952

Add weight quantization in post_training_quanzitaion (#22445) · 197913eb

由 cc 提交于 2月 07, 2020

* support weight quantization in post_training_quanzitaion, test=develop
* add test for weight quantization, test=develop

197913eb

Enable the detection of subgraph composed of grad ops (#21223) · dcfb6038

由 Yiqun Liu 提交于 2月 07, 2020

* Add the first implememtation of fusion_group op #19621 (#3)

* Add the dynamic load of nvrtc, and support runtime compiling of CUDA kernel using nvrtc.
test=develop

* Call CUDA driver api to launch the kernel compiled by nvrtc.
test=develop

* Disable for mac and windows.
test=develop

* Refine the codes to support manually specified num_threads and workload_per_thread.
test=develop

* Refine the CUDA kernel to support large dims.
test=develop

* Add DeviceCodePool to manage all device codes.

* Add the first implementation fusion_group op.

* Add unit-test for fusion_group op.

* Add the check of result.

* Add the check of nvrtc in unit-test.
test=develop

* Add comment to explain the inputs, outputs and features of fusion_group op.
test=develop

* Disable fusion_group op for mac and windows.
test=develop

* Make the compiling of device code return status instead of hanging up.
test=develop

* Add the check of whether there is CUDA driver library, and do not core dump when failing to call the CUDA driver API.

* Unify fusion_group_op's input and output names.
test=develop

* Add the check of CUDA driver library in unittest.
test=develop

* Enable generating code for a given subgraph. #21126 (#4)

* Enable generating code for a given subgraph.

* Support sorting the subgraph.

* Remove the rearange of expressions because we use the sorted subgraph directly.

* Enable generating code for a subgraph which is composed of grad ops.

* Use expression information to check the accuracy in unittest.

* Separate load and store from computation expressions.
test=develop

* Improve the loading statements in generated codes.
test=develop

* Remove unused arguments from formal list.
test=develop

* Enable the detection of subgraph of grad ops.

* Generate code for detected subgraph in fusion_group_pass.

* Add an option in BuildStrategy to enable fusion_group_pass and add unittest.
test=develop

* Fix a bug when checking whether the shape of all inputs are the same.

* Add debug information.

* Remove subgraph_detector from inference/analysis to the common framework/ir directory. (#5)

test=develop

* Call subgraph_detector in fusion_group pass.
test=develop

* Disable fusion_group when WITH_GPU is OFF.
test=develop

* Refine all PADDLE_ENFORCE message.
test=develop

* Fix the case that some inputs are not defined in grad ops, and set op_role for fused op.
test=develop

* Follow review comments.
test=develop

dcfb6038

polish no_grad_set of gradient and append_backward (#22440) · 50af6b5d

由 Aurelius84 提交于 2月 07, 2020

* polish backward api doc test=develop, test=document_preview,
       test=document_fix

* polish backward api doc test=develop, test=document_preview, test=document_fix

* no_grad supports set of Variable test=develop, test=document_preview

* polish sample code of append_backward test=develop, test=document_preview

* modify assert into Raise TypeError test=develop,test=document_preview

* fix unittest failed test=develop

* rm useless file test=develop

* polish en doc test=develop

* polish code of no_grad_set test=develop

* polish code of no_grad_set test=develop

50af6b5d

T
refine reshape_op shape error message (#22480) · 7c9ce097
由 Tao Luo 提交于 2月 07, 2020
```
test=develop
```
7c9ce097
L
optimize performance of interpolate op (#22436) · 2b1386b2
由 LielinJiang 提交于 2月 07, 2020
```
* optimize interpolate op, test=develop
```
2b1386b2

06 2月, 2020 6 次提交
- W
  
  use enum class to replace the usage of enum in some condition test=develop (#22464) · 77dd0d97
  由 wangchaochaohu 提交于 2月 07, 2020
  
  77dd0d97
- Y
  Correct the use of DeviceContext in unittest sequence_pooling_test and... · 44b45b9f
  由 Yiqun Liu 提交于 2月 06, 2020
```
Correct the use of DeviceContext in unittest sequence_pooling_test and sequence_padding_test (#22456)

* Add log in memory::Copy for debug purpose.

* Change to use context in DeviceContextPool directly in sequence_pooling_test, instead to new one.

* Change to use context in DeviceContextPool directly in sequence_padding_test, instead to new one.
test=develop

* Change the type of second_dim from size_t to int64_t.
test=develop
```
  44b45b9f
- F
  R language support (#22417) · b80eef79
  由 flame 提交于 2月 06, 2020
```
* R-language inference support
```
  b80eef79
- T
  
  change check_type_and_dtype to check_variable_and_dtype (#22465) · 6b7bb6b5
  由 Tao Luo 提交于 2月 06, 2020
  
  6b7bb6b5
- J
  Add dequant-scale squash (#22409) · 17f2c089
  由 joanna.wozna.intel 提交于 2月 06, 2020
```
* Add dequant scale squash

test=develop

* Correct dequant-scale squash test

test=develop
```
  17f2c089
- M
  update readme of imdb training demo (#22455) · 9c4deedb
  由 mapingshuo 提交于 2月 06, 2020
```
* update readme

* test=develop
```
  9c4deedb

PaddlePaddle / Paddle 1 年多 前同步成功

PaddlePaddle / Paddle
1 年多前同步成功