提交 · 54970444ce9baf3154a3775c81280ca97d59c83d · PaddlePaddle / Paddle

11 2月, 2020 3 次提交

Improve transpose performance with tile sm copy, test=develop (#22311) · 54970444

由 zhaoyuchen2018 提交于 2月 11, 2020


* Refine code, fix select tile error,test=develop

* Refine element type and some comments, test=develop

* Refine comments and gpu utils, test=develop

* Remove some useless condition

* Refine floor and ceil, test=develop

* refine for loop. test=develop
Signed-off-by: Nzhaoyuchen <zhaoyuchen01@baidu.com>

54970444

Compile without nccl deps. [1/2] (#22509) · a90fa540

由 Wilber 提交于 2月 11, 2020

支持不依赖nccl进行编译。[1/2]

多卡下，如果没有打开WITH_NCCL开关编译，多卡不能通信，则只能选择一张卡使用。
Co-authored-by: N石晓伟 <39303645+Shixiaowei02@users.noreply.github.com>

a90fa540

G
Make assign op support LoDTensorArray and modify while_loop API (#22309) · 3a59a7a1
由 guofei 提交于 2月 11, 2020
```
This PR makes assign op support LoDTensorArray and enable the loop_vars in
while_loop to support tuple or list.
```
3a59a7a1

10 2月, 2020 4 次提交
- Z
  [Refine Paddle-TRT INT8]: Support PaddleSlim's Resnet50, Mobilenetv1, Yolov3... · 54a325a5
  由 Zhaolong Xing 提交于 2月 10, 2020
```
[Refine Paddle-TRT INT8]: Support PaddleSlim's Resnet50, Mobilenetv1, Yolov3 models for Inference. (#22483)

* add int8 op teller for trt.

* refine trt int8

* add int8 op teller for trt.
test=develop
```
  54a325a5
- W
  Compile without nccl deps. [2/2] (#22484) · de009152
  由 Wilber 提交于 2月 10, 2020
```
Compile without nccl deps. [1/2]
Co-authored-by: N石晓伟 <39303645+Shixiaowei02@users.noreply.github.com>
```
  de009152
- Y
  Fix dismatch of std::max's arguments type on windows. (#22507) · 4b2227e9
  由 Yiqun Liu 提交于 2月 10, 2020
```
test=develop
```
  4b2227e9
- W
  
  fix test_fusion_seqpool_concat lod level between compile and runtime (#22488) · 870f4658
  由 Wilber 提交于 2月 10, 2020
  
  870f4658
07 2月, 2020 5 次提交

Fix the integer overflow problem of sequence2batch (#22479) · a61d0952

由 Zhong Hui 提交于 2月 07, 2020

Fix the  integer overflow problem in the op of sequence2batch, change the int32_t to size_t，
In the /paddle/fluid/operators/math/sequence2batch.h#L122.

a61d0952

Add weight quantization in post_training_quanzitaion (#22445) · 197913eb

由 cc 提交于 2月 07, 2020

* support weight quantization in post_training_quanzitaion, test=develop
* add test for weight quantization, test=develop

197913eb

Enable the detection of subgraph composed of grad ops (#21223) · dcfb6038

由 Yiqun Liu 提交于 2月 07, 2020

* Add the first implememtation of fusion_group op #19621 (#3)

* Add the dynamic load of nvrtc, and support runtime compiling of CUDA kernel using nvrtc.
test=develop

* Call CUDA driver api to launch the kernel compiled by nvrtc.
test=develop

* Disable for mac and windows.
test=develop

* Refine the codes to support manually specified num_threads and workload_per_thread.
test=develop

* Refine the CUDA kernel to support large dims.
test=develop

* Add DeviceCodePool to manage all device codes.

* Add the first implementation fusion_group op.

* Add unit-test for fusion_group op.

* Add the check of result.

* Add the check of nvrtc in unit-test.
test=develop

* Add comment to explain the inputs, outputs and features of fusion_group op.
test=develop

* Disable fusion_group op for mac and windows.
test=develop

* Make the compiling of device code return status instead of hanging up.
test=develop

* Add the check of whether there is CUDA driver library, and do not core dump when failing to call the CUDA driver API.

* Unify fusion_group_op's input and output names.
test=develop

* Add the check of CUDA driver library in unittest.
test=develop

* Enable generating code for a given subgraph. #21126 (#4)

* Enable generating code for a given subgraph.

* Support sorting the subgraph.

* Remove the rearange of expressions because we use the sorted subgraph directly.

* Enable generating code for a subgraph which is composed of grad ops.

* Use expression information to check the accuracy in unittest.

* Separate load and store from computation expressions.
test=develop

* Improve the loading statements in generated codes.
test=develop

* Remove unused arguments from formal list.
test=develop

* Enable the detection of subgraph of grad ops.

* Generate code for detected subgraph in fusion_group_pass.

* Add an option in BuildStrategy to enable fusion_group_pass and add unittest.
test=develop

* Fix a bug when checking whether the shape of all inputs are the same.

* Add debug information.

* Remove subgraph_detector from inference/analysis to the common framework/ir directory. (#5)

test=develop

* Call subgraph_detector in fusion_group pass.
test=develop

* Disable fusion_group when WITH_GPU is OFF.
test=develop

* Refine all PADDLE_ENFORCE message.
test=develop

* Fix the case that some inputs are not defined in grad ops, and set op_role for fused op.
test=develop

* Follow review comments.
test=develop

dcfb6038

T
refine reshape_op shape error message (#22480) · 7c9ce097
由 Tao Luo 提交于 2月 07, 2020
```
test=develop
```
7c9ce097
L
optimize performance of interpolate op (#22436) · 2b1386b2
由 LielinJiang 提交于 2月 07, 2020
```
* optimize interpolate op, test=develop
```
2b1386b2

06 2月, 2020 4 次提交

W

use enum class to replace the usage of enum in some condition test=develop (#22464) · 77dd0d97
由 wangchaochaohu 提交于 2月 07, 2020

77dd0d97

Correct the use of DeviceContext in unittest sequence_pooling_test and... · 44b45b9f

由 Yiqun Liu 提交于 2月 06, 2020

Correct the use of DeviceContext in unittest sequence_pooling_test and sequence_padding_test (#22456)

* Add log in memory::Copy for debug purpose.

* Change to use context in DeviceContextPool directly in sequence_pooling_test, instead to new one.

* Change to use context in DeviceContextPool directly in sequence_padding_test, instead to new one.
test=develop

* Change the type of second_dim from size_t to int64_t.
test=develop

44b45b9f

J
Add dequant-scale squash (#22409) · 17f2c089
由 joanna.wozna.intel 提交于 2月 06, 2020
```
* Add dequant scale squash

test=develop

* Correct dequant-scale squash test

test=develop
```
17f2c089
M
update readme of imdb training demo (#22455) · 9c4deedb
由 mapingshuo 提交于 2月 06, 2020
```
* update readme

* test=develop
```
9c4deedb

05 2月, 2020 3 次提交

Z
[Fix BUG]: Core when multi thread + clone + paddle-trt (#22442) · ceda0b9b
由 Zhaolong Xing 提交于 2月 05, 2020
```
* add mutex for trt engine
test=develop

* add the test for copy_to_cpu
test=develop
```
ceda0b9b

add WITH_NCCL option for cmake. (#22384) · 7bc4b095

由 Wilber 提交于 2月 05, 2020

cmake选项中添加了WITH_NCCL，显示指定是否编译NCCL的部分代码，WITH_NCCL默认打开，但如果WITH_GPU为OFF，则关闭WITH_NCCL

添加了PADDLE_WITH_NCCL定义

单机单卡能够关闭NCCL编译，多卡的话需要默认打开NCCL，如果关闭NCCL，则只能使用单卡
Co-authored-by: N石晓伟 <39303645+Shixiaowei02@users.noreply.github.com>

7bc4b095

fix sigmoid cudnn bug (#22439) · 943cb8c6

由 Tao Luo 提交于 2月 05, 2020

* Sigmoid bug fix, test=develop

* fix code format

test=develop
Co-authored-by: NManjunath Bhat <manjunathbhat9920@gmail.com>

943cb8c6

04 2月, 2020 3 次提交
- X
  fix copy table bug (#22432) · d51ffe86
  由 xujiaqi01 提交于 2月 04, 2020
```
* fix copy table bug of lost some feasign
* test=develop
```
  d51ffe86
- L
  Support int16 for Tensor (#22423) · 822e5b36
  由 Leo Chen 提交于 2月 04, 2020
```
* add int16 support, test=develop

* add test, test=develop

* fix typo, test=develop

* fix dtype error in slice, test=develop
```
  822e5b36
- 石
  
  remove anakin from code, test=develop (#22420) · e1b0d7cb
  由石晓伟提交于 2月 04, 2020
  
  e1b0d7cb
02 2月, 2020 2 次提交
- L
  Update the precision of pad, pad2d, pad_constant_like's unit tests from fp32 to fp64 (#22394) · 0404e7a9
  由 liu zhengxi 提交于 2月 02, 2020
```
* update the ut precision of pad pad2d pad_constant_like from fp32 to fp64, test=develop
```
  0404e7a9
- X
  add GeneralRoleMaker (#22295) · 371f377b
  由 xujiaqi01 提交于 2月 02, 2020
```
* add GeneralRoleMaker which is for general usage
* test=develop
```
  371f377b
31 1月, 2020 2 次提交

[DNNL] Fix accuracy in INT8 FC (#22404) · 269db0d1

由 Michał Gallus 提交于 1月 31, 2020

* Enable quantize to reorder to nchw as well

* Correct FC MKL-DNN input dim requirements to accept 3D

* Improve DNNL FC format, error and 3D input handling

test=develop

* Improve error checking in FC

test=develop

* Improve PADDLE_ENFORCE messages in fc-related files

* Remove data layout attribute from obligatory pass args

test=develop

* Fix message in fc_mkldnn_pass to be logically correct

test=develop

269db0d1

J

[UT coverage]Remove unnecessary transpose op registration (#22402) · fb3086fd
由 joanna.wozna.intel 提交于 1月 31, 2020

fb3086fd

25 1月, 2020 2 次提交
- L
  
  [UT Coverage]Improve sum_mkldnn_op line coverage (#22275) · ade50226
  由 lidanqing 提交于 1月 25, 2020
  
  ade50226
- J
  
  Restore requantize squash (#22399) · 3099d9d4
  由 joanna.wozna.intel 提交于 1月 25, 2020
  
  3099d9d4
23 1月, 2020 1 次提交
- W
  
  improve elementwise_add_mkldnn_op test code coverage (#22359) · 92462e94
  由 Wojciech Uss 提交于 1月 23, 2020
  
  92462e94
22 1月, 2020 1 次提交
- C
  
  add benchmark flag for conv_transpose (#22389) · 20f30dd6
  由 ceci3 提交于 1月 22, 2020
  
  20f30dd6
21 1月, 2020 2 次提交
- L
  polish code, test=develop (#22380) · b96c7c9a
  由 Leo Chen 提交于 1月 21, 2020
```
remove unnecessary template.
```
  b96c7c9a
- C
  Fix GEO-SGD init & send Bug (#22375) · 8f36c395
  由 Chengmo 提交于 1月 21, 2020
```
* test=develop, fix geo Send & Init
```
  8f36c395
19 1月, 2020 4 次提交
- Z
  
  update unittest accuracy to float64 for relu, prelu, maxout (#22273) · c6f888e5
  由 zhupengyang 提交于 1月 19, 2020
  
  c6f888e5
- W
  
  Optimize the depthwise op test=develop (#22265) · 0d8b222b
  由 wangchaochaohu 提交于 1月 19, 2020
  
  0d8b222b
- L
  use function instead of lambda, test=develop (#22348) · aaa4fe49
  由 Leo Chen 提交于 1月 19, 2020
```
* use function instead of lambda, test=develop

* follow comments, test=develop
```
  aaa4fe49
- A
  
  [Bugfix] Preserve shape in inpalce operators (#22360) · e7a9f6bb
  由 Adam 提交于 1月 19, 2020
  
  e7a9f6bb
17 1月, 2020 3 次提交

Q

Fix infer_shape in compling for elementwise_op (#22291) · 2d20869c
由 qingqing01 提交于 1月 17, 2020

2d20869c

Implement a common python unittest to test the ir passes. (#22209) · b7cac50b

由 Yiqun Liu 提交于 1月 17, 2020

* Implement a common python unittest to test the ir passes.
test=develop

* Save the results in np.array and support to startup on CPU.
test=develop

* Fix the unittest.
test=develop

* Add check_program to check whether the optimized program is different from the origin one.
test=develop

* Remove the inferface all_ops.
test=develop

* Add exception test in pass_test.
test=develop

b7cac50b

T
integrated HALF_ASYNC to communicator (#21869) · 82bc814a
由 tangwei12 提交于 1月 17, 2020
```
* add half_async in the communicator
* fix DistributedStrategy
```
82bc814a

16 1月, 2020 1 次提交
- W
  
  remove unused code test=develop (#22327) · 1e932ecc
  由 wangchaochaohu 提交于 1月 17, 2020
  
  1e932ecc

PaddlePaddle / Paddle 大约 1 年 前同步成功

PaddlePaddle / Paddle
大约 1 年前同步成功