提交 · 2281ebf0f3c50a3ba5398632a3e3bc344ca634f2 · BaiXuePrincess / Paddle

22 5月, 2019 1 次提交

Enable the convolution/relu6(bounded_relu) fusion for FP32 on Intel platform. (#17130) · 2281ebf0

由 guomingz 提交于 5月 22, 2019

* Relu6 is the bottleneck op for Mobilenet-v2. As the mkldnn supports the conv/relu6 fusion, we implement it fusion via cpass way. Due to the int8 enabling for this fusion will be supported in MKLDNN v0.20, so this PR is focused on the fp32 optimization.

Below table shows the benchmark(FPS) which measured on skx-8180(28 cores)
Batch size | with fusion | without fusion
-- | -- | --
1 | 214.7 | 53.4
50 | 1219.727 | 137.280

test=develop

* Fix the format issue

test=develop

* Add the missing nolint comments.

test=develop

* Fix the typos.

test=develop

* Register the conv_brelu_mkldnn_fuse_pass for the MKLDNN engine.

test=develop

* Adjust the indentation.

test=develop

* Add the test_conv_brelu_mkldnn_fuse_pass case.

test=develop

* Slightly update the code per Baidu comments.
Let the parameter definition embedded into the code.
That's will make the code easy to understand.

test=develop

2281ebf0

20 5月, 2019 1 次提交
- T
  remove unused expected_kernel_cache_pass (#17486) · 32da5e9c
  由 Tao Luo 提交于 5月 20, 2019
```
test=develop
```
  32da5e9c
08 5月, 2019 1 次提交
- C
  Code Clean: Move all pass to paddle::framework::ir (#17228) · 04bd413a
  由 chengduo 提交于 5月 08, 2019
```
* move pass to ir

* polish code
test=develop

* fix dependency
test=develop
```
  04bd413a
07 5月, 2019 1 次提交

石

Cherry-pick benchmark related changes from release/1.4 (#17156) · a72dbe9a

由石晓伟提交于 5月 07, 2019

* cherry-pick commit from 88770542

* cherry-pick commit from 3f0b97df

* cherry-pick from 16691:Anakin subgraph support yolo_v3 and faster-rcnn

(cherry picked from commit 8643dbc2)

* Cherry-Pick from 16662 : Anakin subgraph cpu support

(cherry picked from commit 7ad182e1)

* Cherry-pick from 1662, 16797.. : add anakin int8 support

(cherry picked from commit e14ab180)

* Cherry-pick from 16813 : change singleton to graph RegistBlock
test=release/1.4

(cherry picked from commit 4b9fa423)

* Cherry Pick : 16837 Support ShuffleNet and MobileNet-v2

Support ShuffleNet and MobileNet-v2, test=release/1.4

(cherry picked from commit a6fb066f)

* Cherry-pick : anakin subgraph add opt config layout argument #16846
test=release/1.4

(cherry picked from commit 8121b3ec)

* 1. add shuffle_channel_detect

(cherry picked from commit 6efdea89)

* update shuffle_channel op convert, test=release/1.4

(cherry picked from commit e4726a06)

* Modify symbol export rules

test=develop

a72dbe9a

28 3月, 2019 1 次提交

Anakin ssd support · d065b5bf

由 nhzlx 提交于 3月 28, 2019

refine trt first run
add quant dequant fuse pass
omit simplify_anakin_priorbox_detection template
omit transpose_flatten_concat_fuse template
test=develop

d065b5bf

25 3月, 2019 1 次提交
- W
  Move cpu_quantize_* passes into mkldnn subfolder · 46677fb0
  由 Wojciech Uss 提交于 3月 25, 2019
```
test=develop
```
  46677fb0
21 3月, 2019 1 次提交
- L
  add expected_kernel_cache_pass · 056599a7
  由 luotao1 提交于 3月 21, 2019
```
test=develop
```
  056599a7
20 3月, 2019 3 次提交
- N
  
  cherry-pick from feature/anakin-engine: refine paddle-anakin to new interface. #16276 · c407dfa3
  由 nhzlx 提交于 3月 20, 2019
  
  c407dfa3
- N
  
  cherry-pick from feature/anakin-engine: deal the changing shape when using anakin #16189 · a25331bc
  由 nhzlx 提交于 3月 20, 2019
  
  a25331bc
- N
  
  cherry-pick from feature/anakin-engine: Anakin support facebox #16111 · a1d200a5
  由 nhzlx 提交于 3月 20, 2019
  
  a1d200a5
19 3月, 2019 4 次提交
- L
  add runtime_context_cache_pass · 82af8031
  由 luotao1 提交于 3月 19, 2019
```
test=develop
```
  82af8031
- Z
  add allocator flags · 22715487
  由 zhhsplendid 提交于 3月 19, 2019
```
test=develop
```
  22715487
- T
  
  Revert "cache runtime_context" · 7d2740db
  由 Tao Luo 提交于 3月 19, 2019
  
  7d2740db
- W
  Add cpu_quantize_placement_pass for C-API quantization (#16265) · af030088
  由 Wojciech Uss 提交于 3月 19, 2019
```
* Add cpu_quantize_placement_pass for C-API quantization

test=develop

* added a comment on required pass attributes

test=develop
```
  af030088
18 3月, 2019 1 次提交

Add cpu_quantize_pass for C-API quantization (#16127) · 2579ade4

由 Wojciech Uss 提交于 3月 18, 2019

* Add cpu_quantize_pass for C-API quantization

test=develop

* add cpu_quantize_pass test

* fix lint: add include memory unorderd_map and unordered_set

test=develop

* fuse_relu 1

test=develop

* tuned 2 without squash

* fixes

test=develop

* remove unused vars

test=develop

* refactored

test=develop

* fix lint c-style cast -> C++ style cast

test=develop

* remove QuantMax and c style casts

test=develop

* last usage of QuantMax removed

test=develop

* Fix Analysis Predictor UT

Check if memory_optimize_pass has already been added
to the analysis config before adding a new one, so
that it is not added multiple times.
test=develop

* change map to unordered_map

fix the forgotten part of cpu_quantize_pass_tester.cc

test=develop

* removed quantized attribute

* fixed cpu_quantize_pass_tester and op attr comments

test=develop

* removed redundant line

test=debug

* removed gmock

test=develop

* fix after merge

2579ade4

16 3月, 2019 1 次提交
- Q
  Fix windows compiling (#16230) · 86e912c5
  由 qingqing01 提交于 3月 16, 2019
```
test=develop
```
  86e912c5
15 3月, 2019 1 次提交

Support sync batch norm. (#16121) · 8ad672a2

由 qingqing01 提交于 3月 15, 2019

* Support Sync Batch Norm.
* Note, do not enable it in one device.

Usage:

build_strategy = fluid.BuildStrategy()
build_strategy.sync_batch_norm = True
binary = fluid.compiler.CompiledProgram(tp).with_data_parallel(
        loss_name=loss_mean.name,
        build_strategy=build_strategy)

8ad672a2

14 3月, 2019 1 次提交

Add cpu_quantize_squash_pass for C-API quantization (#16128) · b9252f3d

由 Wojciech Uss 提交于 3月 14, 2019

* Add cpu_quantize_squash_pass for C-API quantization

test=develop

* add cpu_quantize_squash_pass teste

* fix lint: add include memory unorderd_map and unordered_set

test=develop

* lint fix 2

* fixes

test=develop

* refactored

test=develop

* fix windows ci

test=develop

b9252f3d

13 3月, 2019 1 次提交
- L
  add runtime_context_cache_pass · d94fd972
  由 luotao1 提交于 3月 13, 2019
```
test=develop
```
  d94fd972
26 2月, 2019 1 次提交
- K
  Add MKL-DNN placement pass tester · 72253391
  由 Krzysztof Binias 提交于 2月 26, 2019
```
test=develop
```
  72253391
22 2月, 2019 1 次提交

MKL-DNN: Add test for conv bias fuse pass (#15824) · c4faf36e

由 Michał Gallus 提交于 2月 22, 2019

* MKL-DNN: Add test for conv bias fuse pass

test=develop

* Remove const cast from Conv Bias Pass Test

* Add conv with bias test case for conv+bias fuse ut

test=develop

c4faf36e

31 1月, 2019 1 次提交
- Y
  
  fix save_inferece_model bug (#15365) · 897789b1
  由 Yan Chunwei 提交于 1月 31, 2019
  
  897789b1
29 1月, 2019 1 次提交
- K
  Make separate folders for mkldnn codes · b1bdcd4d
  由 Krzysztof Binias 提交于 1月 28, 2019
```
test=develop
```
  b1bdcd4d
21 1月, 2019 1 次提交

Memory optimization of depthwise conv op and group norm op (#15313) · 9f8f0fc2

由 Dun 提交于 1月 21, 2019

* mem opt

* test=develop

* test=develop

* test=develop

* test=develop

* test=develop

* test=develop

* test=develop

* refine code  test=develop

* refine code  test=develop

* refine code  test=develop

* refine code  test=develop

* refine with cub test=develop

* fix mkldnn test && remove comments && test=develop

* polish code && test=develop

* add only_forward test && test=develop

9f8f0fc2

14 1月, 2019 1 次提交
- T
  
  add fuse pass of sequared mat sub fusion · a5d2a6d1
  由 tensor-tang 提交于 1月 13, 2019
  
  a5d2a6d1
13 1月, 2019 1 次提交
- T
  
  add repeated fc relu pass · a89296ac
  由 tensor-tang 提交于 1月 12, 2019
  
  a89296ac
11 1月, 2019 1 次提交
- Z
  
  add_transpose_flatten_concat_fuse (#15121) · 98e85f37
  由 Zhaolong Xing 提交于 1月 11, 2019
  
  98e85f37
10 1月, 2019 1 次提交
- T
  add seqpool concat fuse pass tester · a0a27bd2
  由 tensor-tang 提交于 1月 09, 2019
```
test=develop
```
  a0a27bd2
08 1月, 2019 1 次提交
- T
  add seqpool concat fuse pass · 72d2a180
  由 tensor-tang 提交于 1月 07, 2019
```
test=develop
```
  72d2a180
07 1月, 2019 1 次提交
- M
  Add no lock optimize pass · 4bfa110f
  由 minqiyang 提交于 1月 07, 2019
```
test=develop
```
  4bfa110f
25 12月, 2018 1 次提交
- N
  add affine_channel fuse. · ce3782c1
  由 nhzlx 提交于 12月 25, 2018
```
fix conv+elemenwise fuse bug.
```
  ce3782c1
16 12月, 2018 1 次提交
- N
  add conv+elementwiseadd pass · 4e4a7772
  由 nhzlx 提交于 12月 16, 2018
```
test=develop
```
  4e4a7772
14 12月, 2018 1 次提交
- Y
  
  Fea/fuse conv elementwise add fuse (#14669) · a985949b
  由 Yan Chunwei 提交于 12月 14, 2018
  
  a985949b
07 12月, 2018 1 次提交
- Y
  Clean Code · 240d974a
  由 Yihua Xu 提交于 12月 07, 2018
```
test=develop
```
  240d974a
03 12月, 2018 1 次提交
- Y
  Implement the fusion of convolution and bias for mkldnn · 64e261c6
  由 Yihua Xu 提交于 12月 03, 2018
```
(test=develop)
```
  64e261c6
15 11月, 2018 1 次提交

add mkldnn prop_kind phase for inference-only case to pooling and activations (#14278) · 8a1eeec5

由 Sylwester Fraczek 提交于 11月 15, 2018

* add is_test to pooling and activations

add prop_kind support for layers activation. conv and pooling

add a pass that sets is_test to true

add transpiler version of is_test pass

test=develop

* patch test and pass

test=develop

* add pass to analyzer.h

test=develop

* add is_test attr description & pass only on mkldnn

in:
activation_op.cc
batch_norm_op.cc
conv_op.cc
dropout_op.cc
lrn_op.cc
pool_op.cc
sequence_pool_op.cc
softmax_op.cc

* fix is_test handling for activation pool and conv

* change description of is_test for all layers again

* remove GetAttr(use_mkldnn) from pass

* rename correct_mkldnn_test_phase to is_test

and remove dependency on MKLDNN
test=develop

* review fix magic number

* two if(..)s into one

* Check is_test once and pass mkldnn forward prop kind

* dereference shared_ptr with * (without get())

test=develop

* add is_test_pass back

test=develop

8a1eeec5

14 11月, 2018 1 次提交
- Y
  
  Combine Inference Analysis with IR (#13914) · 9f252e00
  由 Yan Chunwei 提交于 11月 14, 2018
  
  9f252e00
06 11月, 2018 1 次提交
- X
  add tests · 25123a3b
  由 Xin Pan 提交于 11月 06, 2018
```
test=develop
```
  25123a3b
31 10月, 2018 1 次提交

add depthwise conv mkldnn pass · 4e2aaf01

由 Sylwester Fraczek 提交于 10月 30, 2018

added depthwise conv mkldnn pass which for MKLDNN changes depthwise_conv operator to conv operator because for mkldnn this is the same api
test=develop

4e2aaf01

29 10月, 2018 1 次提交

[1.1] [project] train imagenet using large batch size (#13766) · 26200f2e

由 Wu Yi 提交于 10月 29, 2018

* fix nccl2 lars dist support

* put lars in momentum op

* add tests lars

* fix ci

* fix cpu kernel

* soft warning

* remove lars in test_recognize_digits.py

* move to another op

* add file

* update api.spec test=develop

* update test=develop

* fix api.spec test=develop

* wip

* wip, finish grad merge ops

* wip, finish graph build

* wip test running

* work on 1 gpu

* workable version

* update

* fix tests

* fuse broadcast op

* fix compile failed

* refine

* add batch merge test mnist

* fix CI test=develop

* fix build

* use independent bn params for batch merge test=develop

* update api.spec

* follow comments and for test

* wip

* refine tests test=develop

* follow comments test=develop

* remove startup bn modify test=develop

* follow comments test=develop

* fix merge test=develop

26200f2e

BaiXuePrincess / Paddle 与 Fork 源项目一致

BaiXuePrincess / Paddle
与 Fork 源项目一致