提交 · bc9fd1fc10f5ff3a2310cd1e04b40309ce508cf7 · PaddlePaddle / PaddleDetection

08 7月, 2019 1 次提交

CHERRY-Pick: Inference: fix mask rcnn model diff, optim memory usage, memory leak. #18532 (#18547) · bc9fd1fc

由 Zhaolong Xing 提交于 7月 08, 2019

fix mask rcnn
add interface for setting optim_cache_dir(eg: when in trt int8 mode, and load model from memory, there should be a interface for setting the trt calibration table data dir)

test=release/1.5

bc9fd1fc

28 5月, 2019 1 次提交

[MKL-DNN] conv_transpose mkldnn bias pass (#17644) · 6d8075ec

由 Jacek Czaja 提交于 5月 28, 2019

* - changes to graph detector

- Changes to pass

- Added ut for new pass

- use_pass

- Added pass to mkldnn passes

- fix to registration

- improved verbose messaging for conv bias passes

- Lint fixes

test=develop

* - Lint fixes

test=develop

6d8075ec

27 5月, 2019 1 次提交

add Concat quantization (#17448) · 96845d21

由 Sylwester Fraczek 提交于 5月 27, 2019

* add Concat quantization
add unit test for quantizing concat
fix for wrong value when the input is not in map of calculated scales
add use_quantizer to concat_op.cc
add scale_algo rules for concat

test=develop

* missing fix for multiple inputs quantize-squash

* wojtuss review fix: adding comment

test=develop

96845d21

25 5月, 2019 1 次提交

TRT: Support set dynamic range in int8 mode. (#17524) · 61221ebc

由 Zhaolong Xing 提交于 5月 25, 2019

* fluid int8 train and trt int8 predict align.
trt int8 predict init
op converter

* 2. align fluid int8 train and trt int8 inference.
enhance quant dequant fuse pass
enhance op converter, trt engine, trt engine op, trt subgraph pass.

* 3. add delete_quant_dequant_pass for trt

test=develop

* 4. add the missing file
test=develop

* 5. i modify the c++ interface, but forget to modify the pybind code
fix the IS_TRT_VERSION_GE bug, and fix elementwise op converter
test=develop

61221ebc

24 5月, 2019 2 次提交

[MKL-DNN] Add Fully Connected Op for inference only(#15226) · 0c39b97b

由 Michał Gallus 提交于 5月 24, 2019

* fuse mul and elementwise add to fc

* Reimplement the FC forward operator

* Fix FC MKLDNN integration by transposing weights

* Add FC MKLDNN Pass

test=develop

* FC MKLDNN Pass: change memcpy to std::copy

* Fix MKLDNN FC handling of mismatch input and weights dims

* Lower tolerance for MKL-DNN in resnet50 test

test=develop

* Adjust FC to support MKLDNN Op placement

test=develop

* Adjust Placement Op to set use_mkldnn attribute for graph

test=develop

* MKLDNN FC: fix weights format so that gemm version is called

test=develop

* FC MKLDNN: Remove tolerance decrease from tester_helper

* FC MKL-DNN: Refactor the code, change input reorder to weight reorder

* MKL-DNN FC: Introduce operator caching

test=develop

* FC MKL-DNN: Fix the tensor type in ExpectedKernelType

test=develop

* FC MKL-DNN: fix style changes

test=develop

* FC MKL-DNN: fallback to native on non-supported dim sizes

test=develop

* FC MKLDNN: fix CMake paths

test=develop

* FC MKLDNN: Refine placement pass graph mkldnn attribute

test=develop

* Fix Transpiler error for fuse_conv_eltwise

test=develop

* Fix missing STL includes in files

test=develop

* FC MKL-DNN: Enable new output size computation

Also, refine pass to comply with newest interface.
test=develop

* FC MKL-DNN: enable only when fc_mkldnn_pass is enabled

* FC MKL-DNN: Allow Weights to use oi or io format

* FC MKL-DNN: Adjust UT to work with correct dims

test=develop

* Enable MKL DEBUG for resnet50 analyzer

test=develop

* FC MKL-DNN: Improve Hashing function

test=develop

* FC MKL-DNN: Fix shape for fc weights in transpiler

* FC MKL-DNN: Update input pointer in re-used fc primitive

* Add log for not handling fc fuse for unsupported dims

test=develop

* FC MKL-DNN: Move transpose from pass to Op Kernel

test=develop

* FC MKL-DNN: Disable transpose in unit test

test=develop

* FC MKL-DNN: Remove fc_mkldnn_pass from default list

* Correct Flag for fake data analyzer tests

test=develop

* FC MKL-DNN: Add comment about fc mkldnn pass disablement

test=develop

* FC MKL-DNN: Disable fc in int8 tests

test=develop

0c39b97b

Conv concat relu quantization (#17466) · 5b2a3c4b

由 Sylwester Fraczek 提交于 5月 24, 2019

* add conv_concat_relu fuse

test=develop

* add test code

test=develop

* added missing include with unordered_map

test=develop

* review fixes for wojtuss

test=develop

* remove 'should (not) be fused' comment statements

one of them was invalid anyway

test=develop

5b2a3c4b

22 5月, 2019 1 次提交

Enable the convolution/relu6(bounded_relu) fusion for FP32 on Intel platform. (#17130) · 2281ebf0

由 guomingz 提交于 5月 22, 2019

* Relu6 is the bottleneck op for Mobilenet-v2. As the mkldnn supports the conv/relu6 fusion, we implement it fusion via cpass way. Due to the int8 enabling for this fusion will be supported in MKLDNN v0.20, so this PR is focused on the fp32 optimization.

Below table shows the benchmark(FPS) which measured on skx-8180(28 cores)
Batch size | with fusion | without fusion
-- | -- | --
1 | 214.7 | 53.4
50 | 1219.727 | 137.280

test=develop

* Fix the format issue

test=develop

* Add the missing nolint comments.

test=develop

* Fix the typos.

test=develop

* Register the conv_brelu_mkldnn_fuse_pass for the MKLDNN engine.

test=develop

* Adjust the indentation.

test=develop

* Add the test_conv_brelu_mkldnn_fuse_pass case.

test=develop

* Slightly update the code per Baidu comments.
Let the parameter definition embedded into the code.
That's will make the code easy to understand.

test=develop

2281ebf0

07 5月, 2019 1 次提交

石

Cherry-pick benchmark related changes from release/1.4 (#17156) · a72dbe9a

由石晓伟提交于 5月 07, 2019

* cherry-pick commit from 88770542

* cherry-pick commit from 3f0b97df

* cherry-pick from 16691:Anakin subgraph support yolo_v3 and faster-rcnn

(cherry picked from commit 8643dbc2)

* Cherry-Pick from 16662 : Anakin subgraph cpu support

(cherry picked from commit 7ad182e1)

* Cherry-pick from 1662, 16797.. : add anakin int8 support

(cherry picked from commit e14ab180)

* Cherry-pick from 16813 : change singleton to graph RegistBlock
test=release/1.4

(cherry picked from commit 4b9fa423)

* Cherry Pick : 16837 Support ShuffleNet and MobileNet-v2

Support ShuffleNet and MobileNet-v2, test=release/1.4

(cherry picked from commit a6fb066f)

* Cherry-pick : anakin subgraph add opt config layout argument #16846
test=release/1.4

(cherry picked from commit 8121b3ec)

* 1. add shuffle_channel_detect

(cherry picked from commit 6efdea89)

* update shuffle_channel op convert, test=release/1.4

(cherry picked from commit e4726a06)

* Modify symbol export rules

test=develop

a72dbe9a

28 3月, 2019 1 次提交

Anakin ssd support · d065b5bf

由 nhzlx 提交于 3月 28, 2019

refine trt first run
add quant dequant fuse pass
omit simplify_anakin_priorbox_detection template
omit transpose_flatten_concat_fuse template
test=develop

d065b5bf

20 3月, 2019 3 次提交
- N
  
  cherry-pick from feature/anakin-engine: deal the changing shape when using anakin #16189 · a25331bc
  由 nhzlx 提交于 3月 20, 2019
  
  a25331bc
- N
  
  cherry-pick from feature/anakin-engine: Anakin support facebox #16111 · a1d200a5
  由 nhzlx 提交于 3月 20, 2019
  
  a1d200a5
- W
  fix pattern maching conv2d with(out) ResidualData · 104a9f1e
  由 Wojciech Uss 提交于 3月 20, 2019
```
test=develop
```
  104a9f1e
19 3月, 2019 1 次提交
- Z
  add allocator flags · 22715487
  由 zhhsplendid 提交于 3月 19, 2019
```
test=develop
```
  22715487
18 3月, 2019 1 次提交

Add cpu_quantize_pass for C-API quantization (#16127) · 2579ade4

由 Wojciech Uss 提交于 3月 18, 2019

* Add cpu_quantize_pass for C-API quantization

test=develop

* add cpu_quantize_pass test

* fix lint: add include memory unorderd_map and unordered_set

test=develop

* fuse_relu 1

test=develop

* tuned 2 without squash

* fixes

test=develop

* remove unused vars

test=develop

* refactored

test=develop

* fix lint c-style cast -> C++ style cast

test=develop

* remove QuantMax and c style casts

test=develop

* last usage of QuantMax removed

test=develop

* Fix Analysis Predictor UT

Check if memory_optimize_pass has already been added
to the analysis config before adding a new one, so
that it is not added multiple times.
test=develop

* change map to unordered_map

fix the forgotten part of cpu_quantize_pass_tester.cc

test=develop

* removed quantized attribute

* fixed cpu_quantize_pass_tester and op attr comments

test=develop

* removed redundant line

test=debug

* removed gmock

test=develop

* fix after merge

2579ade4

14 3月, 2019 1 次提交

Add cpu_quantize_squash_pass for C-API quantization (#16128) · b9252f3d

由 Wojciech Uss 提交于 3月 14, 2019

* Add cpu_quantize_squash_pass for C-API quantization

test=develop

* add cpu_quantize_squash_pass teste

* fix lint: add include memory unorderd_map and unordered_set

test=develop

* lint fix 2

* fixes

test=develop

* refactored

test=develop

* fix windows ci

test=develop

b9252f3d

11 1月, 2019 1 次提交
- Z
  
  add_transpose_flatten_concat_fuse (#15121) · 98e85f37
  由 Zhaolong Xing 提交于 1月 11, 2019
  
  98e85f37
25 12月, 2018 1 次提交
- N
  add affine_channel fuse. · ce3782c1
  由 nhzlx 提交于 12月 25, 2018
```
fix conv+elemenwise fuse bug.
```
  ce3782c1
16 12月, 2018 1 次提交
- N
  add conv+elementwiseadd pass · 4e4a7772
  由 nhzlx 提交于 12月 16, 2018
```
test=develop
```
  4e4a7772
14 12月, 2018 1 次提交
- Y
  
  Fea/fuse conv elementwise add fuse (#14669) · a985949b
  由 Yan Chunwei 提交于 12月 14, 2018
  
  a985949b
03 12月, 2018 1 次提交
- Y
  Implement the fusion of convolution and bias for mkldnn · 64e261c6
  由 Yihua Xu 提交于 12月 03, 2018
```
(test=develop)
```
  64e261c6
16 11月, 2018 1 次提交

MKLDNN residual connections fuse pass: · 7423748e

由 Tomasz Patejko 提交于 11月 06, 2018

* implements reachability check between identity node and non-identity argument to elementwise_add
* implements handling identity node as x and as y argument to elementwise_add

7423748e

14 11月, 2018 1 次提交
- Y
  
  Combine Inference Analysis with IR (#13914) · 9f252e00
  由 Yan Chunwei 提交于 11月 14, 2018
  
  9f252e00
21 10月, 2018 7 次提交
- T
  MKLDNN conv + elementwise_add fusion: UT for missing bias added. UTs... · ce2464fd
  由 Tomasz Patejko 提交于 10月 19, 2018
```
MKLDNN conv + elementwise_add fusion: UT for missing bias added. UTs refactored. Some minor changes in the pass
```
  ce2464fd
- T
  MKLDNN conv + elementwise_add fusion: fix for order of parameters in elementwise_add in resnet50 · 0fe3079c
  由 Tomasz Patejko 提交于 10月 16, 2018
```
test=develop
```
  0fe3079c
- T
  MKLDNN conv + elementwise_add fusion: skip connection attribute renamed.... · 4be45af1
  由 Tomasz Patejko 提交于 9月 27, 2018
```
MKLDNN conv + elementwise_add fusion: skip connection attribute renamed. Comments about patterns added.

test=develop
```
  4be45af1
- T
  
  MKLDNN conv + elementwise_add fusion: changed a name of a formal argument in ElementwiseAdd pattern · 9a335e02
  由 Tomasz Patejko 提交于 9月 27, 2018
  
  9a335e02
- T
  
  MKLDNN conv + elementwise_add fusion: implementation changed to conform with Paddle API · efd76614
  由 Tomasz Patejko 提交于 9月 26, 2018
  
  efd76614
- T
  refine fuse pattern and attr · 40f8456a
  由 tensor-tang 提交于 10月 21, 2018
```
test=develop
```
  40f8456a
- T
  
  add seqconv eltadd relu pass · 603ba5e0
  由 tensor-tang 提交于 10月 19, 2018
  
  603ba5e0
19 10月, 2018 1 次提交
- M
  
  Conv+Bias fuse · 582f59c1
  由 Michal Gallus 提交于 10月 12, 2018
  
  582f59c1
11 10月, 2018 1 次提交
- T
  
  Revert "[MKLDNN] Pass: Fuse Conv + Bias" · 9b11a175
  由 Tao Luo 提交于 10月 11, 2018
  
  9b11a175
10 10月, 2018 1 次提交
- M
  Pass: Fuse Conv + Bias · 40b17be4
  由 Michal Gallus 提交于 10月 01, 2018
```
test=develop
```
  40b17be4
08 10月, 2018 1 次提交

conv bn fuse pass · 78f98294

由 Sylwester Fraczek 提交于 9月 19, 2018

review fix

review from hshen14 fix

test=develop

fix error in broadcast and code cleanup

rename bias -> eltwise and added macro to shorten code

formatting

78f98294

27 9月, 2018 1 次提交

- Added initial pass for embedding-fc-lstm · 7ab5626d

由 Jacek Czaja 提交于 9月 13, 2018

- Added draft of new operator

- Added fused embedding fc lstm files

- First time embedding_fc_lstm_fuse_pass was invoked in
  test_text_classification

- Added Embedding pattern

- Not crashing

- Enabled draft of embedding_fc_lstm pass (does it job)

- First working (Seqcompute only) version

- Removed diagnostic comment

- First enabling of BatchCompute

- Disabling pass for embedding with is_sparse and is_distributed

- Cosmetics

- Style

- Style

7ab5626d

25 9月, 2018 1 次提交
- S
  
  make bias unnecessary for ConvRelu fuse · a49aa4da
  由 Sylwester Fraczek 提交于 9月 24, 2018
  
  a49aa4da
20 9月, 2018 1 次提交

Feature/op_fuse_pass (#12440) · d402234b

由 chengduo 提交于 9月 20, 2018

* Add Preface

* Add demo code

* Save file

* Refine code

* seems can work

* use elementwise strategy

* Use ElementwiseComputeEx

* Add comments

* extract functions from operator

* Refine code

* Follow comment

* code refine

* add op_fuse  pass

* add backward

* code refine

* use TopologySortOperations

* follow comments

* refine IsFusible

* code enhance

* fix op_fusion_pass

* refine code

* refine fuse_elemwise_act_op

* adjust the input and output

* refine logic

* add intermediate_edge

* disable inplace

* follow comments

* refine logic

* follow comments

* Remove the removable IntermediateOut

* change strategy

* code refine

* enable fuse backward

* code refine

* code refine

* rename unit test

* follow comments

d402234b

12 9月, 2018 1 次提交
- S
  
  create conv relu pass for MKLDNN (#13258) · 41de582b
  由 Sylwester Fraczek 提交于 9月 12, 2018
  
  41de582b
10 9月, 2018 2 次提交
- S
  
  simple fix · 6b2f680d
  由 superjomn 提交于 9月 10, 2018
  
  6b2f680d
- Y
  
  refactor ir pattern (#13304) · 478a4e85
  由 Yan Chunwei 提交于 9月 10, 2018
  
  478a4e85
06 9月, 2018 1 次提交
- T
  
  add fuse fc gru pass · f057077c
  由 tensor-tang 提交于 9月 05, 2018
  
  f057077c

PaddlePaddle / PaddleDetection 大约 1 年 前同步成功

PaddlePaddle / PaddleDetection
大约 1 年前同步成功