提交 · 2281ebf0f3c50a3ba5398632a3e3bc344ca634f2 · BaiXuePrincess / Paddle

22 5月, 2019 1 次提交

Enable the convolution/relu6(bounded_relu) fusion for FP32 on Intel platform. (#17130) · 2281ebf0

由 guomingz 提交于 5月 22, 2019

* Relu6 is the bottleneck op for Mobilenet-v2. As the mkldnn supports the conv/relu6 fusion, we implement it fusion via cpass way. Due to the int8 enabling for this fusion will be supported in MKLDNN v0.20, so this PR is focused on the fp32 optimization.

Below table shows the benchmark(FPS) which measured on skx-8180(28 cores)
Batch size | with fusion | without fusion
-- | -- | --
1 | 214.7 | 53.4
50 | 1219.727 | 137.280

test=develop

* Fix the format issue

test=develop

* Add the missing nolint comments.

test=develop

* Fix the typos.

test=develop

* Register the conv_brelu_mkldnn_fuse_pass for the MKLDNN engine.

test=develop

* Adjust the indentation.

test=develop

* Add the test_conv_brelu_mkldnn_fuse_pass case.

test=develop

* Slightly update the code per Baidu comments.
Let the parameter definition embedded into the code.
That's will make the code easy to understand.

test=develop

2281ebf0

20 5月, 2019 2 次提交
- L
  
  remove two useless flags: enable_subgraph_optimize, memory_optimize_debug, test=develop (#17491) · c3949f56
  由 liuwei1031 提交于 5月 20, 2019
  
  c3949f56
- T
  remove unused expected_kernel_cache_pass (#17486) · 32da5e9c
  由 Tao Luo 提交于 5月 20, 2019
```
test=develop
```
  32da5e9c
16 5月, 2019 2 次提交
- Z
  
  fix recurrent_op,test=develop (#17433) · 712bfb17
  由 Zeng Jinle 提交于 5月 16, 2019
  
  712bfb17
- Z
  Add setting Scope function for the graph class (#17417) · 4a1b7fec
  由 Zhen Wang 提交于 5月 16, 2019
```
* add set_not_owned function for graph

* add scope set. test=develop

* add scope_ptr enforce not null before setting.test=develop
```
  4a1b7fec
08 5月, 2019 1 次提交
- C
  Code Clean: Move all pass to paddle::framework::ir (#17228) · 04bd413a
  由 chengduo 提交于 5月 08, 2019
```
* move pass to ir

* polish code
test=develop

* fix dependency
test=develop
```
  04bd413a
07 5月, 2019 2 次提交

Enhance inplace/mem-opt pass and enhance softmax_with_cross_entropy op inplace (#17225) · 4f859408

由 Zeng Jinle 提交于 5月 07, 2019

* add use_cuda to inplace pass,test=develop

* add test softmax_with_xe_inplace test,test=develop

* fix potential inplace bug
test=develop

* add more skip vars in mem opt pass,test=develop

* follow comment,test=develop

* follow comments,move duplicate out arg check to program->graph,test=develop

4f859408

石

Cherry-pick benchmark related changes from release/1.4 (#17156) · a72dbe9a

由石晓伟提交于 5月 07, 2019

* cherry-pick commit from 88770542

* cherry-pick commit from 3f0b97df

* cherry-pick from 16691:Anakin subgraph support yolo_v3 and faster-rcnn

(cherry picked from commit 8643dbc2)

* Cherry-Pick from 16662 : Anakin subgraph cpu support

(cherry picked from commit 7ad182e1)

* Cherry-pick from 1662, 16797.. : add anakin int8 support

(cherry picked from commit e14ab180)

* Cherry-pick from 16813 : change singleton to graph RegistBlock
test=release/1.4

(cherry picked from commit 4b9fa423)

* Cherry Pick : 16837 Support ShuffleNet and MobileNet-v2

Support ShuffleNet and MobileNet-v2, test=release/1.4

(cherry picked from commit a6fb066f)

* Cherry-pick : anakin subgraph add opt config layout argument #16846
test=release/1.4

(cherry picked from commit 8121b3ec)

* 1. add shuffle_channel_detect

(cherry picked from commit 6efdea89)

* update shuffle_channel op convert, test=release/1.4

(cherry picked from commit e4726a06)

* Modify symbol export rules

test=develop

a72dbe9a

30 4月, 2019 2 次提交

fix bn fuse vardesc and add model saver (#17143) · 79ed1c76

由 tensor-tang 提交于 4月 30, 2019

* fix bn fuse vardesc and add model saver

test=develop

* unify save model in test helper

test=develop

* fix mkdir on windows

test=develop

* remove magic number use bn bias var desc

test=develop

79ed1c76

Rewrite inplace pass and fix gc bug (#17126) · 4e1bc6e8

由 Zeng Jinle 提交于 4月 29, 2019

* fix op graph view
test=develop

* rewrite inplace pass and fix reference count pass bug
test=develop

* fix unittest failed
test=develop

* follow comments, test=develop

4e1bc6e8

12 4月, 2019 1 次提交

Fix the order while sorting the operators (#16756) · 93cedfdb

由 Yihua Xu 提交于 4月 12, 2019

* Fix the order when sorting operators.

test=develop

* Enable transfomer compare test item.

test=develop

* Use set to replace vector.

test=develop

93cedfdb

11 4月, 2019 1 次提交

Add an option to enable the cache of expected kernel in train phase. (#16724) · 112f1614

由 Yiqun Liu 提交于 4月 11, 2019

* Add an option to enable the cache of expected kernel in train phase.
test=develop

* Change the default value of cache_expected_kernel to true.

112f1614

08 4月, 2019 1 次提交

Enable the runtime_context_cache pass in train phase (#16640) · 3fe8cb0d

由 Yiqun Liu 提交于 4月 08, 2019

* Try to enable the runtime_context_cache pass in train phase.

* Put the append of runtime_context_cache pass ahead of multi_dev passes.
test=develop

3fe8cb0d

04 4月, 2019 1 次提交
- L
  update expected_kernel_cache_pass · 695f2db6
  由 luotao1 提交于 4月 04, 2019
```
test=develop
```
  695f2db6
02 4月, 2019 1 次提交
- G
  
  fix batch merge bug (#16601) · 423bc515
  由 gongweibao 提交于 4月 02, 2019
  
  423bc515
28 3月, 2019 2 次提交

Anakin ssd support · d065b5bf

由 nhzlx 提交于 3月 28, 2019

refine trt first run
add quant dequant fuse pass
omit simplify_anakin_priorbox_detection template
omit transpose_flatten_concat_fuse template
test=develop

d065b5bf

Fix the interface of Pass::Apply (#16484) · ed61d67c

由 chengduo 提交于 3月 27, 2019

* modify the interface of Pass::Allay
test=develop

* Polish code
test=develop

* Fix Travis CI
test=develop

* fix Pass::Apply interface
test=develop

* Fix Travis CI
test=develop

ed61d67c

27 3月, 2019 1 次提交
- Q
  
  fix cpplint test=develop · 392e97aa
  由 Qiao Longfei 提交于 3月 27, 2019
  
  392e97aa
25 3月, 2019 1 次提交
- W
  Move cpu_quantize_* passes into mkldnn subfolder · 46677fb0
  由 Wojciech Uss 提交于 3月 25, 2019
```
test=develop
```
  46677fb0
21 3月, 2019 2 次提交
- L
  add expected_kernel_cache_pass · 056599a7
  由 luotao1 提交于 3月 21, 2019
```
test=develop
```
  056599a7
- W
  Add enabling quantization (#16326) · cbe2dbf0
  由 Wojciech Uss 提交于 3月 21, 2019
```
* Add enabling quantization

test=develop

* remove unused (here) function
```
  cbe2dbf0
20 3月, 2019 5 次提交
- N
  
  cherry-pick from feature/anakin-engine: refine paddle-anakin to new interface. #16276 · c407dfa3
  由 nhzlx 提交于 3月 20, 2019
  
  c407dfa3
- N
  
  cherry-pick from feature/anakin-engine: deal the changing shape when using anakin #16189 · a25331bc
  由 nhzlx 提交于 3月 20, 2019
  
  a25331bc
- N
  cherry-pick from feature/anakin-engine: refine anakin subgraph. #16157 · 69d37f81
  由 nhzlx 提交于 3月 20, 2019
```
support change input size
```
  69d37f81
- N
  
  cherry-pick from feature/anakin-engine: Anakin support facebox #16111 · a1d200a5
  由 nhzlx 提交于 3月 20, 2019
  
  a1d200a5
- W
  fix pattern maching conv2d with(out) ResidualData · 104a9f1e
  由 Wojciech Uss 提交于 3月 20, 2019
```
test=develop
```
  104a9f1e
19 3月, 2019 4 次提交
- L
  add runtime_context_cache_pass · 82af8031
  由 luotao1 提交于 3月 19, 2019
```
test=develop
```
  82af8031
- Z
  add allocator flags · 22715487
  由 zhhsplendid 提交于 3月 19, 2019
```
test=develop
```
  22715487
- T
  
  Revert "cache runtime_context" · 7d2740db
  由 Tao Luo 提交于 3月 19, 2019
  
  7d2740db
- W
  Add cpu_quantize_placement_pass for C-API quantization (#16265) · af030088
  由 Wojciech Uss 提交于 3月 19, 2019
```
* Add cpu_quantize_placement_pass for C-API quantization

test=develop

* added a comment on required pass attributes

test=develop
```
  af030088
18 3月, 2019 4 次提交

M
Polish code style · b40e41fb
由 minqiyang 提交于 3月 18, 2019
```
test=develop
```
b40e41fb
M
Take DataType and VarType apart · 36dce65b
由 minqiyang 提交于 3月 18, 2019
```
test=develop
```
36dce65b
L
refine with comments · cc0ae1f1
由 luotao1 提交于 3月 18, 2019
```
test=develop
```
cc0ae1f1

Add cpu_quantize_pass for C-API quantization (#16127) · 2579ade4

由 Wojciech Uss 提交于 3月 18, 2019

* Add cpu_quantize_pass for C-API quantization

test=develop

* add cpu_quantize_pass test

* fix lint: add include memory unorderd_map and unordered_set

test=develop

* fuse_relu 1

test=develop

* tuned 2 without squash

* fixes

test=develop

* remove unused vars

test=develop

* refactored

test=develop

* fix lint c-style cast -> C++ style cast

test=develop

* remove QuantMax and c style casts

test=develop

* last usage of QuantMax removed

test=develop

* Fix Analysis Predictor UT

Check if memory_optimize_pass has already been added
to the analysis config before adding a new one, so
that it is not added multiple times.
test=develop

* change map to unordered_map

fix the forgotten part of cpu_quantize_pass_tester.cc

test=develop

* removed quantized attribute

* fixed cpu_quantize_pass_tester and op attr comments

test=develop

* removed redundant line

test=debug

* removed gmock

test=develop

* fix after merge

2579ade4

16 3月, 2019 1 次提交
- Q
  Fix windows compiling (#16230) · 86e912c5
  由 qingqing01 提交于 3月 16, 2019
```
test=develop
```
  86e912c5
15 3月, 2019 2 次提交

Support sync batch norm. (#16121) · 8ad672a2

由 qingqing01 提交于 3月 15, 2019

* Support Sync Batch Norm.
* Note, do not enable it in one device.

Usage:

build_strategy = fluid.BuildStrategy()
build_strategy.sync_batch_norm = True
binary = fluid.compiler.CompiledProgram(tp).with_data_parallel(
        loss_name=loss_mean.name,
        build_strategy=build_strategy)

8ad672a2

M

Implement infer var type context · ca392c7e
由 minqiyang 提交于 3月 15, 2019

ca392c7e

14 3月, 2019 1 次提交

Add cpu_quantize_squash_pass for C-API quantization (#16128) · b9252f3d

由 Wojciech Uss 提交于 3月 14, 2019

* Add cpu_quantize_squash_pass for C-API quantization

test=develop

* add cpu_quantize_squash_pass teste

* fix lint: add include memory unorderd_map and unordered_set

test=develop

* lint fix 2

* fixes

test=develop

* refactored

test=develop

* fix windows ci

test=develop

b9252f3d

13 3月, 2019 1 次提交
- L
  add runtime_context_cache_pass · d94fd972
  由 luotao1 提交于 3月 13, 2019
```
test=develop
```
  d94fd972
12 3月, 2019 1 次提交
- Z
  
  Add some fixme. test=develop · 5685a48c
  由 Zhen Wang 提交于 3月 12, 2019
  
  5685a48c

BaiXuePrincess / Paddle 与 Fork 源项目一致

BaiXuePrincess / Paddle
与 Fork 源项目一致