提交 · 2c07727fb05de45c15f6c34e468cd50491915a8f · BaiXuePrincess / Paddle

14 10月, 2019 1 次提交
- P
  
  add DisableGlogInfo() to AnalysisConfig, test=develop (#20581) · 443f604c
  由 Pei Yang 提交于 10月 14, 2019
  
  443f604c
12 10月, 2019 1 次提交

Add ConvTranspose + BatchNorm fuse pass (#20161) · 7faa3e95

由 Adam 提交于 10月 12, 2019

* Add ConvTranspose + BatchNorm fuse pass
test=develop

* Add tests for conv+bn and conv_transpose+bn passes
test=develop

7faa3e95

16 9月, 2019 1 次提交

Enhance fc_fuse_pass to enable fusing relu to fc_op (#19733) · c67c8758

由 Yiqun Liu 提交于 9月 16, 2019

* Refine the codes related to fc op.

* Add GPU implementation for fc functor.

* Apply fc_fuse_pass in GPU inference.
test=develop

* Change the cmake for fc op.

* Change PADDLE_ENFORCE to PADDLE_ENFORCE_EQ.

* Add an attribute to set the activation type in fc_op.

* Enhance the unittest of fc_op.
test=develop

* Remove the declaration of FCOpGrad back to the header file.
test=develop

* Set default value for newly added arguments in test_fc_op.
test=develop

* Enhance fc_fuse_pass to enable fusing relu.

* Allow print the shapes of var_desc in graph.
test=develop

* Enhance fc_fuse_pass_tester.

* Remove the use of PADDLE_ENFORCE.
test=develop

* Correct the number of ops after fusing.
test=develop

* Fix a typo.
test=develop

* Set activation_type to null when there is no relu in fc.
test=develop

* Refine fc_fuse_pass's codes.

* Enable the set of shape for tensor.

* Refine repeated_fc_relu_pass and add unittest.
test=develop

c67c8758

27 8月, 2019 1 次提交
- J
  
  Add conv dequant squash for int8 (#18905) · 2e3ec66b
  由 joanna.wozna.intel 提交于 8月 27, 2019
  
  2e3ec66b
21 8月, 2019 1 次提交

Add generalized Conv+Activation MKLDNN fuse pass creation Part2 (#19237) · 97d1db18

由 Adam 提交于 8月 21, 2019

* Add generalized Conv+Activation MKLDNN fuse pass creation Part2
test=develop

* Undefined behaviour of GetAttrIfExists<> FIX
test=develop

97d1db18

15 8月, 2019 1 次提交
- A
  Add generalized Conv+Activation MKLDNN fuse pass creation (#19072) · b837689e
  由 Adam 提交于 8月 15, 2019
```
test=develop
```
  b837689e
13 8月, 2019 1 次提交

Add conv reqantize squash (#18754) · 492a00f5

由 joanna.wozna.intel 提交于 8月 13, 2019

* Add requantize squash

test=develop

* Add more precise tests
test=develop

* REname and REfactor tester

test=develop

492a00f5

24 7月, 2019 1 次提交

Update trt5 for paddle-trt (#18645) · 26ae6d49

由 Zhaolong Xing 提交于 7月 24, 2019

* update paddle-trt for:
    1. fix bug: when batch > 2, core in split plugin.
    2. add leaky_relu trt5.0 support (yolov3 from 65ms to 42ms.)
    3. add new attr to dropout.
    4. shuffle channel, swish, relu6 support
    test=develop

* 1. fix ci
test=develop

26ae6d49

08 7月, 2019 1 次提交

Inference: fix mask rcnn model diff, optim memory usage, memory leak. (#18532) · 88b52a27

由 Zhaolong Xing 提交于 7月 08, 2019

* Fix Mask rcnn predictor
    1. refine memory optim algorithm to support the model with the block op.
    2. output diff : modify the affine channel fuse
    3. add condition_block_infer op
add interface for setting trt calib table dir
test=develop

* add the missing files.
test=develop

88b52a27

27 6月, 2019 1 次提交
- S
  add int8 mkldnn prior_box (#17242) · 9252e8fa
  由 Sylwester Fraczek 提交于 6月 27, 2019
```
add prior_box quantization code

add scale algo rules for prior box

test=develop
```
  9252e8fa
28 5月, 2019 1 次提交

[MKL-DNN] conv_transpose mkldnn bias pass (#17644) · 6d8075ec

由 Jacek Czaja 提交于 5月 28, 2019

* - changes to graph detector

- Changes to pass

- Added ut for new pass

- use_pass

- Added pass to mkldnn passes

- fix to registration

- improved verbose messaging for conv bias passes

- Lint fixes

test=develop

* - Lint fixes

test=develop

6d8075ec

27 5月, 2019 1 次提交

add Concat quantization (#17448) · 96845d21

由 Sylwester Fraczek 提交于 5月 27, 2019

* add Concat quantization
add unit test for quantizing concat
fix for wrong value when the input is not in map of calculated scales
add use_quantizer to concat_op.cc
add scale_algo rules for concat

test=develop

* missing fix for multiple inputs quantize-squash

* wojtuss review fix: adding comment

test=develop

96845d21

25 5月, 2019 1 次提交

TRT: Support set dynamic range in int8 mode. (#17524) · 61221ebc

由 Zhaolong Xing 提交于 5月 25, 2019

* fluid int8 train and trt int8 predict align.
trt int8 predict init
op converter

* 2. align fluid int8 train and trt int8 inference.
enhance quant dequant fuse pass
enhance op converter, trt engine, trt engine op, trt subgraph pass.

* 3. add delete_quant_dequant_pass for trt

test=develop

* 4. add the missing file
test=develop

* 5. i modify the c++ interface, but forget to modify the pybind code
fix the IS_TRT_VERSION_GE bug, and fix elementwise op converter
test=develop

61221ebc

24 5月, 2019 2 次提交

[MKL-DNN] Add Fully Connected Op for inference only(#15226) · 0c39b97b

由 Michał Gallus 提交于 5月 24, 2019

* fuse mul and elementwise add to fc

* Reimplement the FC forward operator

* Fix FC MKLDNN integration by transposing weights

* Add FC MKLDNN Pass

test=develop

* FC MKLDNN Pass: change memcpy to std::copy

* Fix MKLDNN FC handling of mismatch input and weights dims

* Lower tolerance for MKL-DNN in resnet50 test

test=develop

* Adjust FC to support MKLDNN Op placement

test=develop

* Adjust Placement Op to set use_mkldnn attribute for graph

test=develop

* MKLDNN FC: fix weights format so that gemm version is called

test=develop

* FC MKLDNN: Remove tolerance decrease from tester_helper

* FC MKL-DNN: Refactor the code, change input reorder to weight reorder

* MKL-DNN FC: Introduce operator caching

test=develop

* FC MKL-DNN: Fix the tensor type in ExpectedKernelType

test=develop

* FC MKL-DNN: fix style changes

test=develop

* FC MKL-DNN: fallback to native on non-supported dim sizes

test=develop

* FC MKLDNN: fix CMake paths

test=develop

* FC MKLDNN: Refine placement pass graph mkldnn attribute

test=develop

* Fix Transpiler error for fuse_conv_eltwise

test=develop

* Fix missing STL includes in files

test=develop

* FC MKL-DNN: Enable new output size computation

Also, refine pass to comply with newest interface.
test=develop

* FC MKL-DNN: enable only when fc_mkldnn_pass is enabled

* FC MKL-DNN: Allow Weights to use oi or io format

* FC MKL-DNN: Adjust UT to work with correct dims

test=develop

* Enable MKL DEBUG for resnet50 analyzer

test=develop

* FC MKL-DNN: Improve Hashing function

test=develop

* FC MKL-DNN: Fix shape for fc weights in transpiler

* FC MKL-DNN: Update input pointer in re-used fc primitive

* Add log for not handling fc fuse for unsupported dims

test=develop

* FC MKL-DNN: Move transpose from pass to Op Kernel

test=develop

* FC MKL-DNN: Disable transpose in unit test

test=develop

* FC MKL-DNN: Remove fc_mkldnn_pass from default list

* Correct Flag for fake data analyzer tests

test=develop

* FC MKL-DNN: Add comment about fc mkldnn pass disablement

test=develop

* FC MKL-DNN: Disable fc in int8 tests

test=develop

0c39b97b

Conv concat relu quantization (#17466) · 5b2a3c4b

由 Sylwester Fraczek 提交于 5月 24, 2019

* add conv_concat_relu fuse

test=develop

* add test code

test=develop

* added missing include with unordered_map

test=develop

* review fixes for wojtuss

test=develop

* remove 'should (not) be fused' comment statements

one of them was invalid anyway

test=develop

5b2a3c4b

22 5月, 2019 1 次提交

Enable the convolution/relu6(bounded_relu) fusion for FP32 on Intel platform. (#17130) · 2281ebf0

由 guomingz 提交于 5月 22, 2019

* Relu6 is the bottleneck op for Mobilenet-v2. As the mkldnn supports the conv/relu6 fusion, we implement it fusion via cpass way. Due to the int8 enabling for this fusion will be supported in MKLDNN v0.20, so this PR is focused on the fp32 optimization.

Below table shows the benchmark(FPS) which measured on skx-8180(28 cores)
Batch size | with fusion | without fusion
-- | -- | --
1 | 214.7 | 53.4
50 | 1219.727 | 137.280

test=develop

* Fix the format issue

test=develop

* Add the missing nolint comments.

test=develop

* Fix the typos.

test=develop

* Register the conv_brelu_mkldnn_fuse_pass for the MKLDNN engine.

test=develop

* Adjust the indentation.

test=develop

* Add the test_conv_brelu_mkldnn_fuse_pass case.

test=develop

* Slightly update the code per Baidu comments.
Let the parameter definition embedded into the code.
That's will make the code easy to understand.

test=develop

2281ebf0

07 5月, 2019 1 次提交

石

Cherry-pick benchmark related changes from release/1.4 (#17156) · a72dbe9a

由石晓伟提交于 5月 07, 2019

* cherry-pick commit from 88770542

* cherry-pick commit from 3f0b97df

* cherry-pick from 16691:Anakin subgraph support yolo_v3 and faster-rcnn

(cherry picked from commit 8643dbc2)

* Cherry-Pick from 16662 : Anakin subgraph cpu support

(cherry picked from commit 7ad182e1)

* Cherry-pick from 1662, 16797.. : add anakin int8 support

(cherry picked from commit e14ab180)

* Cherry-pick from 16813 : change singleton to graph RegistBlock
test=release/1.4

(cherry picked from commit 4b9fa423)

* Cherry Pick : 16837 Support ShuffleNet and MobileNet-v2

Support ShuffleNet and MobileNet-v2, test=release/1.4

(cherry picked from commit a6fb066f)

* Cherry-pick : anakin subgraph add opt config layout argument #16846
test=release/1.4

(cherry picked from commit 8121b3ec)

* 1. add shuffle_channel_detect

(cherry picked from commit 6efdea89)

* update shuffle_channel op convert, test=release/1.4

(cherry picked from commit e4726a06)

* Modify symbol export rules

test=develop

a72dbe9a

28 3月, 2019 1 次提交

Anakin ssd support · d065b5bf

由 nhzlx 提交于 3月 28, 2019

refine trt first run
add quant dequant fuse pass
omit simplify_anakin_priorbox_detection template
omit transpose_flatten_concat_fuse template
test=develop

d065b5bf

20 3月, 2019 4 次提交
- N
  
  cherry-pick from feature/anakin-engine: deal the changing shape when using anakin #16189 · a25331bc
  由 nhzlx 提交于 3月 20, 2019
  
  a25331bc
- N
  cherry-pick from feature/anakin-engine: refine anakin subgraph. #16157 · 69d37f81
  由 nhzlx 提交于 3月 20, 2019
```
support change input size
```
  69d37f81
- N
  
  cherry-pick from feature/anakin-engine: Anakin support facebox #16111 · a1d200a5
  由 nhzlx 提交于 3月 20, 2019
  
  a1d200a5
- W
  fix pattern maching conv2d with(out) ResidualData · 104a9f1e
  由 Wojciech Uss 提交于 3月 20, 2019
```
test=develop
```
  104a9f1e
19 3月, 2019 1 次提交
- Z
  add allocator flags · 22715487
  由 zhhsplendid 提交于 3月 19, 2019
```
test=develop
```
  22715487
18 3月, 2019 1 次提交

Add cpu_quantize_pass for C-API quantization (#16127) · 2579ade4

由 Wojciech Uss 提交于 3月 18, 2019

* Add cpu_quantize_pass for C-API quantization

test=develop

* add cpu_quantize_pass test

* fix lint: add include memory unorderd_map and unordered_set

test=develop

* fuse_relu 1

test=develop

* tuned 2 without squash

* fixes

test=develop

* remove unused vars

test=develop

* refactored

test=develop

* fix lint c-style cast -> C++ style cast

test=develop

* remove QuantMax and c style casts

test=develop

* last usage of QuantMax removed

test=develop

* Fix Analysis Predictor UT

Check if memory_optimize_pass has already been added
to the analysis config before adding a new one, so
that it is not added multiple times.
test=develop

* change map to unordered_map

fix the forgotten part of cpu_quantize_pass_tester.cc

test=develop

* removed quantized attribute

* fixed cpu_quantize_pass_tester and op attr comments

test=develop

* removed redundant line

test=debug

* removed gmock

test=develop

* fix after merge

2579ade4

14 3月, 2019 1 次提交

Add cpu_quantize_squash_pass for C-API quantization (#16128) · b9252f3d

由 Wojciech Uss 提交于 3月 14, 2019

* Add cpu_quantize_squash_pass for C-API quantization

test=develop

* add cpu_quantize_squash_pass teste

* fix lint: add include memory unorderd_map and unordered_set

test=develop

* lint fix 2

* fixes

test=develop

* refactored

test=develop

* fix windows ci

test=develop

b9252f3d

19 2月, 2019 1 次提交
- T
  fix warnings (#15790) · e1c707fe
  由 tensor-tang 提交于 2月 19, 2019
```
* fix warnings

test=develop

* fix enforce test

test=develop
```
  e1c707fe
31 1月, 2019 1 次提交
- Y
  
  remove dot marked node (#15606) · dc5e25fc
  由 Yan Chunwei 提交于 1月 31, 2019
  
  dc5e25fc
11 1月, 2019 1 次提交
- Z
  
  add_transpose_flatten_concat_fuse (#15121) · 98e85f37
  由 Zhaolong Xing 提交于 1月 11, 2019
  
  98e85f37
26 12月, 2018 2 次提交
- N
  faster rcnn input is presistable. (fix it in paddle-trt) · a6aa8ea7
  由 nhzlx 提交于 12月 26, 2018
```
test=develop
```
  a6aa8ea7
- H
  Fix conv_elementwise_add2_act pass · 956cf921
  由 hjchen2 提交于 12月 26, 2018
```
test=develop
```
  956cf921
25 12月, 2018 1 次提交
- N
  add affine_channel fuse. · ce3782c1
  由 nhzlx 提交于 12月 25, 2018
```
fix conv+elemenwise fuse bug.
```
  ce3782c1
16 12月, 2018 1 次提交
- N
  add conv+elementwiseadd pass · 4e4a7772
  由 nhzlx 提交于 12月 16, 2018
```
test=develop
```
  4e4a7772
14 12月, 2018 1 次提交
- Y
  
  Fea/fuse conv elementwise add fuse (#14669) · a985949b
  由 Yan Chunwei 提交于 12月 14, 2018
  
  a985949b
07 12月, 2018 1 次提交
- Y
  Clean Code · 240d974a
  由 Yihua Xu 提交于 12月 07, 2018
```
test=develop
```
  240d974a
03 12月, 2018 1 次提交
- Y
  Implement the fusion of convolution and bias for mkldnn · 64e261c6
  由 Yihua Xu 提交于 12月 03, 2018
```
(test=develop)
```
  64e261c6
26 11月, 2018 1 次提交
- M
  Revert the changes of VLOG · 53433d7f
  由 minqiyang 提交于 11月 26, 2018
```
test=develop
```
  53433d7f
16 11月, 2018 1 次提交

MKLDNN residual connections fuse pass: · 7423748e

由 Tomasz Patejko 提交于 11月 06, 2018

* implements reachability check between identity node and non-identity argument to elementwise_add
* implements handling identity node as x and as y argument to elementwise_add

7423748e

12 11月, 2018 2 次提交
- T
  speedup DetectPatterns · 668ae523
  由 Tao Luo 提交于 11月 12, 2018
```
test=develop
```
  668ae523
- Y
  
  fix mac graph detector sort (#14356) · 9a6e2392
  由 Yan Chunwei 提交于 11月 12, 2018
  
  9a6e2392
09 11月, 2018 1 次提交

Exhaustive search for cuDNN conv. (#14286) · abe20923

由 qingqing01 提交于 11月 09, 2018

* exhaustive search for cuDNN conv.
* Refine code and add unit testing.
* Fix model load in fluid/inference and unit testing in conv2d
* Follow comments.
* Fix compiling test=develop

abe20923

BaiXuePrincess / Paddle 与 Fork 源项目一致

BaiXuePrincess / Paddle
与 Fork 源项目一致