提交 · 432ac70124b802ca9a18900a3d1a92b73c8bae99 · BaiXuePrincess / Paddle

25 5月, 2019 1 次提交

TRT: Support set dynamic range in int8 mode. (#17524) · 61221ebc

由 Zhaolong Xing 提交于 5月 25, 2019

* fluid int8 train and trt int8 predict align.
trt int8 predict init
op converter

* 2. align fluid int8 train and trt int8 inference.
enhance quant dequant fuse pass
enhance op converter, trt engine, trt engine op, trt subgraph pass.

* 3. add delete_quant_dequant_pass for trt

test=develop

* 4. add the missing file
test=develop

* 5. i modify the c++ interface, but forget to modify the pybind code
fix the IS_TRT_VERSION_GE bug, and fix elementwise op converter
test=develop

61221ebc

24 5月, 2019 2 次提交

[MKL-DNN] Add Fully Connected Op for inference only(#15226) · 0c39b97b

由 Michał Gallus 提交于 5月 24, 2019

* fuse mul and elementwise add to fc

* Reimplement the FC forward operator

* Fix FC MKLDNN integration by transposing weights

* Add FC MKLDNN Pass

test=develop

* FC MKLDNN Pass: change memcpy to std::copy

* Fix MKLDNN FC handling of mismatch input and weights dims

* Lower tolerance for MKL-DNN in resnet50 test

test=develop

* Adjust FC to support MKLDNN Op placement

test=develop

* Adjust Placement Op to set use_mkldnn attribute for graph

test=develop

* MKLDNN FC: fix weights format so that gemm version is called

test=develop

* FC MKLDNN: Remove tolerance decrease from tester_helper

* FC MKL-DNN: Refactor the code, change input reorder to weight reorder

* MKL-DNN FC: Introduce operator caching

test=develop

* FC MKL-DNN: Fix the tensor type in ExpectedKernelType

test=develop

* FC MKL-DNN: fix style changes

test=develop

* FC MKL-DNN: fallback to native on non-supported dim sizes

test=develop

* FC MKLDNN: fix CMake paths

test=develop

* FC MKLDNN: Refine placement pass graph mkldnn attribute

test=develop

* Fix Transpiler error for fuse_conv_eltwise

test=develop

* Fix missing STL includes in files

test=develop

* FC MKL-DNN: Enable new output size computation

Also, refine pass to comply with newest interface.
test=develop

* FC MKL-DNN: enable only when fc_mkldnn_pass is enabled

* FC MKL-DNN: Allow Weights to use oi or io format

* FC MKL-DNN: Adjust UT to work with correct dims

test=develop

* Enable MKL DEBUG for resnet50 analyzer

test=develop

* FC MKL-DNN: Improve Hashing function

test=develop

* FC MKL-DNN: Fix shape for fc weights in transpiler

* FC MKL-DNN: Update input pointer in re-used fc primitive

* Add log for not handling fc fuse for unsupported dims

test=develop

* FC MKL-DNN: Move transpose from pass to Op Kernel

test=develop

* FC MKL-DNN: Disable transpose in unit test

test=develop

* FC MKL-DNN: Remove fc_mkldnn_pass from default list

* Correct Flag for fake data analyzer tests

test=develop

* FC MKL-DNN: Add comment about fc mkldnn pass disablement

test=develop

* FC MKL-DNN: Disable fc in int8 tests

test=develop

0c39b97b

Conv concat relu quantization (#17466) · 5b2a3c4b

由 Sylwester Fraczek 提交于 5月 24, 2019

* add conv_concat_relu fuse

test=develop

* add test code

test=develop

* added missing include with unordered_map

test=develop

* review fixes for wojtuss

test=develop

* remove 'should (not) be fused' comment statements

one of them was invalid anyway

test=develop

5b2a3c4b

22 5月, 2019 1 次提交

Enable the convolution/relu6(bounded_relu) fusion for FP32 on Intel platform. (#17130) · 2281ebf0

由 guomingz 提交于 5月 22, 2019

* Relu6 is the bottleneck op for Mobilenet-v2. As the mkldnn supports the conv/relu6 fusion, we implement it fusion via cpass way. Due to the int8 enabling for this fusion will be supported in MKLDNN v0.20, so this PR is focused on the fp32 optimization.

Below table shows the benchmark(FPS) which measured on skx-8180(28 cores)
Batch size | with fusion | without fusion
-- | -- | --
1 | 214.7 | 53.4
50 | 1219.727 | 137.280

test=develop

* Fix the format issue

test=develop

* Add the missing nolint comments.

test=develop

* Fix the typos.

test=develop

* Register the conv_brelu_mkldnn_fuse_pass for the MKLDNN engine.

test=develop

* Adjust the indentation.

test=develop

* Add the test_conv_brelu_mkldnn_fuse_pass case.

test=develop

* Slightly update the code per Baidu comments.
Let the parameter definition embedded into the code.
That's will make the code easy to understand.

test=develop

2281ebf0

07 5月, 2019 1 次提交

石

Cherry-pick benchmark related changes from release/1.4 (#17156) · a72dbe9a

由石晓伟提交于 5月 07, 2019

* cherry-pick commit from 88770542

* cherry-pick commit from 3f0b97df

* cherry-pick from 16691:Anakin subgraph support yolo_v3 and faster-rcnn

(cherry picked from commit 8643dbc2)

* Cherry-Pick from 16662 : Anakin subgraph cpu support

(cherry picked from commit 7ad182e1)

* Cherry-pick from 1662, 16797.. : add anakin int8 support

(cherry picked from commit e14ab180)

* Cherry-pick from 16813 : change singleton to graph RegistBlock
test=release/1.4

(cherry picked from commit 4b9fa423)

* Cherry Pick : 16837 Support ShuffleNet and MobileNet-v2

Support ShuffleNet and MobileNet-v2, test=release/1.4

(cherry picked from commit a6fb066f)

* Cherry-pick : anakin subgraph add opt config layout argument #16846
test=release/1.4

(cherry picked from commit 8121b3ec)

* 1. add shuffle_channel_detect

(cherry picked from commit 6efdea89)

* update shuffle_channel op convert, test=release/1.4

(cherry picked from commit e4726a06)

* Modify symbol export rules

test=develop

a72dbe9a

28 3月, 2019 1 次提交

Anakin ssd support · d065b5bf

由 nhzlx 提交于 3月 28, 2019

refine trt first run
add quant dequant fuse pass
omit simplify_anakin_priorbox_detection template
omit transpose_flatten_concat_fuse template
test=develop

d065b5bf

20 3月, 2019 4 次提交
- N
  
  cherry-pick from feature/anakin-engine: deal the changing shape when using anakin #16189 · a25331bc
  由 nhzlx 提交于 3月 20, 2019
  
  a25331bc
- N
  cherry-pick from feature/anakin-engine: refine anakin subgraph. #16157 · 69d37f81
  由 nhzlx 提交于 3月 20, 2019
```
support change input size
```
  69d37f81
- N
  
  cherry-pick from feature/anakin-engine: Anakin support facebox #16111 · a1d200a5
  由 nhzlx 提交于 3月 20, 2019
  
  a1d200a5
- W
  fix pattern maching conv2d with(out) ResidualData · 104a9f1e
  由 Wojciech Uss 提交于 3月 20, 2019
```
test=develop
```
  104a9f1e
19 3月, 2019 1 次提交
- Z
  add allocator flags · 22715487
  由 zhhsplendid 提交于 3月 19, 2019
```
test=develop
```
  22715487
18 3月, 2019 1 次提交

Add cpu_quantize_pass for C-API quantization (#16127) · 2579ade4

由 Wojciech Uss 提交于 3月 18, 2019

* Add cpu_quantize_pass for C-API quantization

test=develop

* add cpu_quantize_pass test

* fix lint: add include memory unorderd_map and unordered_set

test=develop

* fuse_relu 1

test=develop

* tuned 2 without squash

* fixes

test=develop

* remove unused vars

test=develop

* refactored

test=develop

* fix lint c-style cast -> C++ style cast

test=develop

* remove QuantMax and c style casts

test=develop

* last usage of QuantMax removed

test=develop

* Fix Analysis Predictor UT

Check if memory_optimize_pass has already been added
to the analysis config before adding a new one, so
that it is not added multiple times.
test=develop

* change map to unordered_map

fix the forgotten part of cpu_quantize_pass_tester.cc

test=develop

* removed quantized attribute

* fixed cpu_quantize_pass_tester and op attr comments

test=develop

* removed redundant line

test=debug

* removed gmock

test=develop

* fix after merge

2579ade4

14 3月, 2019 1 次提交

Add cpu_quantize_squash_pass for C-API quantization (#16128) · b9252f3d

由 Wojciech Uss 提交于 3月 14, 2019

* Add cpu_quantize_squash_pass for C-API quantization

test=develop

* add cpu_quantize_squash_pass teste

* fix lint: add include memory unorderd_map and unordered_set

test=develop

* lint fix 2

* fixes

test=develop

* refactored

test=develop

* fix windows ci

test=develop

b9252f3d

19 2月, 2019 1 次提交
- T
  fix warnings (#15790) · e1c707fe
  由 tensor-tang 提交于 2月 19, 2019
```
* fix warnings

test=develop

* fix enforce test

test=develop
```
  e1c707fe
31 1月, 2019 1 次提交
- Y
  
  remove dot marked node (#15606) · dc5e25fc
  由 Yan Chunwei 提交于 1月 31, 2019
  
  dc5e25fc
11 1月, 2019 1 次提交
- Z
  
  add_transpose_flatten_concat_fuse (#15121) · 98e85f37
  由 Zhaolong Xing 提交于 1月 11, 2019
  
  98e85f37
26 12月, 2018 2 次提交
- N
  faster rcnn input is presistable. (fix it in paddle-trt) · a6aa8ea7
  由 nhzlx 提交于 12月 26, 2018
```
test=develop
```
  a6aa8ea7
- H
  Fix conv_elementwise_add2_act pass · 956cf921
  由 hjchen2 提交于 12月 26, 2018
```
test=develop
```
  956cf921
25 12月, 2018 1 次提交
- N
  add affine_channel fuse. · ce3782c1
  由 nhzlx 提交于 12月 25, 2018
```
fix conv+elemenwise fuse bug.
```
  ce3782c1
16 12月, 2018 1 次提交
- N
  add conv+elementwiseadd pass · 4e4a7772
  由 nhzlx 提交于 12月 16, 2018
```
test=develop
```
  4e4a7772
14 12月, 2018 1 次提交
- Y
  
  Fea/fuse conv elementwise add fuse (#14669) · a985949b
  由 Yan Chunwei 提交于 12月 14, 2018
  
  a985949b
07 12月, 2018 1 次提交
- Y
  Clean Code · 240d974a
  由 Yihua Xu 提交于 12月 07, 2018
```
test=develop
```
  240d974a
03 12月, 2018 1 次提交
- Y
  Implement the fusion of convolution and bias for mkldnn · 64e261c6
  由 Yihua Xu 提交于 12月 03, 2018
```
(test=develop)
```
  64e261c6
26 11月, 2018 1 次提交
- M
  Revert the changes of VLOG · 53433d7f
  由 minqiyang 提交于 11月 26, 2018
```
test=develop
```
  53433d7f
16 11月, 2018 1 次提交

MKLDNN residual connections fuse pass: · 7423748e

由 Tomasz Patejko 提交于 11月 06, 2018

* implements reachability check between identity node and non-identity argument to elementwise_add
* implements handling identity node as x and as y argument to elementwise_add

7423748e

12 11月, 2018 2 次提交
- T
  speedup DetectPatterns · 668ae523
  由 Tao Luo 提交于 11月 12, 2018
```
test=develop
```
  668ae523
- Y
  
  fix mac graph detector sort (#14356) · 9a6e2392
  由 Yan Chunwei 提交于 11月 12, 2018
  
  9a6e2392
09 11月, 2018 1 次提交

Exhaustive search for cuDNN conv. (#14286) · abe20923

由 qingqing01 提交于 11月 09, 2018

* exhaustive search for cuDNN conv.
* Refine code and add unit testing.
* Fix model load in fluid/inference and unit testing in conv2d
* Follow comments.
* Fix compiling test=develop

abe20923

08 11月, 2018 1 次提交
- M
  Change the origin VLOG level to 10 times · 0c3227a5
  由 minqiyang 提交于 11月 08, 2018
```
Fix code to support cpplint syntax check

test=develop
```
  0c3227a5
07 11月, 2018 2 次提交
- Q
  Revert " Exhaustive search for cuDNN conv. (#14043)" · db8c52da
  由 qingqing01 提交于 11月 07, 2018
```
This reverts commit ce7d9b07.
```
  db8c52da
- Q
  Exhaustive search for cuDNN conv. (#14043) · ce7d9b07
  由 qingqing01 提交于 11月 07, 2018
```
* exhaustive search for cuDNN conv.
* Refine code and add unit testing.
* Clean code
* Fix model load in fluid/inference and unit testing in conv2d
* Follow comments.
```
  ce7d9b07
02 11月, 2018 1 次提交
- Y
  
  fix graph pattern detector (#14186) · f76fee64
  由 Yan Chunwei 提交于 11月 01, 2018
  
  f76fee64
21 10月, 2018 8 次提交
- T
  MKLDNN conv + elementwise_add fusion: UT for missing bias added. UTs... · ce2464fd
  由 Tomasz Patejko 提交于 10月 19, 2018
```
MKLDNN conv + elementwise_add fusion: UT for missing bias added. UTs refactored. Some minor changes in the pass
```
  ce2464fd
- T
  
  MKLDNN conv + elementwise_add fusion: fix for crash when bias is not present · 4e72ab41
  由 Tomasz Patejko 提交于 10月 19, 2018
  
  4e72ab41
- T
  MKLDNN conv + elementwise_add fusion: fix for order of parameters in elementwise_add in resnet50 · 0fe3079c
  由 Tomasz Patejko 提交于 10月 16, 2018
```
test=develop
```
  0fe3079c
- T
  MKLDNN conv + elementwise_add fusion: new nodes marked as input or output · 8fb29b2c
  由 Tomasz Patejko 提交于 9月 28, 2018
```
test=develop
```
  8fb29b2c
- T
  
  MKLDNN conv + elementwise_add fusion: changed a name of a formal argument in ElementwiseAdd pattern · 9a335e02
  由 Tomasz Patejko 提交于 9月 27, 2018
  
  9a335e02
- T
  
  MKLDNN conv + elementwise_add fusion: implementation changed to conform with Paddle API · efd76614
  由 Tomasz Patejko 提交于 9月 26, 2018
  
  efd76614
- T
  refine fuse pattern and attr · 40f8456a
  由 tensor-tang 提交于 10月 21, 2018
```
test=develop
```
  40f8456a
- T
  
  add seqconv eltadd relu pass · 603ba5e0
  由 tensor-tang 提交于 10月 19, 2018
  
  603ba5e0

BaiXuePrincess / Paddle 与 Fork 源项目一致

BaiXuePrincess / Paddle
与 Fork 源项目一致