提交 · bc9fd1fc10f5ff3a2310cd1e04b40309ce508cf7 · PaddlePaddle / PaddleDetection

08 7月, 2019 1 次提交

CHERRY-Pick: Inference: fix mask rcnn model diff, optim memory usage, memory leak. #18532 (#18547) · bc9fd1fc

由 Zhaolong Xing 提交于 7月 08, 2019

fix mask rcnn
add interface for setting optim_cache_dir(eg: when in trt int8 mode, and load model from memory, there should be a interface for setting the trt calibration table data dir)

test=release/1.5

bc9fd1fc

05 7月, 2019 1 次提交
- G
  
  checkerrpick Make fuse_all_reduce_op_pass support mix_precision test=develop test=release (#18490) · 3232618a
  由 gongweibao 提交于 7月 05, 2019
  
  3232618a
28 6月, 2019 1 次提交

石

Update the Anakin interfaces for content-dnn and MLU, test=release/1.5 (#18028) · 924e53b7

由石晓伟提交于 6月 28, 2019

* Update the Anakin interfaces for content-dnn and MLU (#17890)

* update anakin-engine interfaces for content-dnn

test=develop

* support only-gpu mode of Anakin

modify eltwise parse

test=develop

* modification for thread-safe

test=develop

* Integrated template instance

test=develop

* increase template parameters

test=develop

* support MLU predictor

test=develop

* update anakin cmake files

test=develop

* update TargetWrapper::set_device

* update the initialization of anakin subgraph

test=develop

* use the default constructor of base class

test=develop

* modify the access level of anakin engine (#18015)

test=develop

* fix ci test cmake test=develop

924e53b7

27 6月, 2019 1 次提交

Cherry pick Fix Bug-prone code of PE (#18355) · b09ba8a7

由 chengduo 提交于 6月 27, 2019

* update pe reduce config
test=release/1.5

*  drop the local_exe_scopes of the previous parallel_executor
test=release/1.5

b09ba8a7

26 6月, 2019 1 次提交
- C
  update reduce config (#18335) · 401c03fc
  由 chengduo 提交于 6月 26, 2019
```
test=release/1.5
```
  401c03fc
24 6月, 2019 1 次提交
- C
  update alloc_continuous_space_for_grad_pass (#18288) · cac315f9
  由 chengduo 提交于 6月 24, 2019
```
test=release/1.5
```
  cac315f9
19 6月, 2019 3 次提交
- C
  [Cherry-pick] Update execution_strategy option default value (#18184) · 6e3c9dd7
  由 chengduo 提交于 6月 19, 2019
```
* update execution_strategy option default value
test=release/1.5

* fix doc error
test=release/1.5
```
  6e3c9dd7
- C
  [Cherry Pick] Not init nccl when rank is 1 (#18170) · 041bc72c
  由 chengduo 提交于 6月 19, 2019
```
* remove nccl dep when the number of GPU is 1
test=develop

* use multi card run syncBN
test=release/1.5
```
  041bc72c
- H
  add trainer_desc proto DEPS (#18019) (#18130) · 39002b08
  由 hutuxian 提交于 6月 19, 2019
```
Add trainer_desc proto DEPS to solve CI random fail.
```
  39002b08
17 6月, 2019 1 次提交

Pipeline Concurrency (#17402) (#17971) · a0732cba

由 hutuxian 提交于 6月 17, 2019

cherry-pick for (https://github.com/PaddlePaddle/Paddle/pull/17402)

Add Pipeline Concurrency Train Mode:
- Cpp: pipeline_trainer & section_worker
- Python: PipelineOptimizer
- Add a new data_feed type: PrivateInstantDataFeed
- Add a test demo of pipeline trainer and the test model is gnn
- Do not support win32 now

a0732cba

15 6月, 2019 1 次提交
- C
  [Cherry pick]Update CPU_NUM config (#18110) · be8c82cc
  由 chengduo 提交于 6月 15, 2019
```
* update CPU_NUM config
test=develop
```
  be8c82cc
14 6月, 2019 1 次提交
- G
  
  cherrpick fixncclid 18025 test=release/1.5 (#18093) · 751497db
  由 gongweibao 提交于 6月 14, 2019
  
  751497db
13 6月, 2019 1 次提交
- G
  
  Polish codes of old prs (#17981) · 73eacf3e
  由 gongweibao 提交于 6月 13, 2019
  
  73eacf3e
10 6月, 2019 2 次提交
- Z
  Remove attribute in Allocator::Allocate (#17878) · 3ece61f7
  由 Zeng Jinle 提交于 6月 10, 2019
```
* remove attribute in Allocator::Allocate, test=develop

* fix travis ci error, test=develop
```
  3ece61f7
- G
  
  Fix FLAGS_fuse_parameter_memory_size unit from Bytes to MBytes. (#17924) · 972c54cd
  由 gongweibao 提交于 6月 10, 2019
  
  972c54cd
08 6月, 2019 1 次提交
- G
  
  Fix sync_batch_norm_op ncclallreduce error! (#17918) · dd4cd352
  由 gongweibao 提交于 6月 08, 2019
  
  dd4cd352
06 6月, 2019 2 次提交
- G
  
  Add backward and optimizer operator dependency pass. (#17746) · fbbdc9cc
  由 gongweibao 提交于 6月 06, 2019
  
  fbbdc9cc
- W
  Make ParallelExecutor support Windows GPU (#17787) · 453a49b1
  由 wopeizl 提交于 6月 06, 2019
```
* fix the ParallelExecutor on Windows
test=develop
* restrict to use one GPU only under windows
```
  453a49b1
05 6月, 2019 1 次提交

[NGraph] some ngraph updates to enable bert (#17739) · a4c528a3

由 baojun 提交于 6月 05, 2019

* delay infershape test=develop

* fall back subblock to paddle test=develop

* fix edge cases test=develop

* remove output duplicates test=develop

* handle reshape2_grad infershape test=develop

a4c528a3

04 6月, 2019 2 次提交
- C
  fix DropLocalExeScopes (#17829) · 43752047
  由 chengduo 提交于 6月 04, 2019
```
test=develop
```
  43752047
- L
  enable mkldnn primitive reuse for platform reorder (#17826) · 50326563
  由 Leo Zhao 提交于 6月 04, 2019
```
test=develop
```
  50326563
03 6月, 2019 1 次提交
- C
  polish error doc (#17772) · 863c7516
  由 chengduo 提交于 6月 03, 2019
```
test=develop
```
  863c7516
31 5月, 2019 1 次提交

fix prepare context redundant code problem, optimize executor by cach… (#17743) · d5239109

由 guru4elephant 提交于 5月 31, 2019

* fix prepare context redundant code problem, optimize executor by caching create_varaiables
test=develop

* cache sub_scope, program, var when use_program_cache=True is set

* make fetch_list runable with variables, add more unittest for use_program_cache

d5239109

30 5月, 2019 2 次提交

C
Add Event in ScopeBuffer Executor (#17667) · 67c8dade
由 chengduo 提交于 5月 30, 2019
```
* add event for fast executor and add threads for scopebuffer executor
test=develop
```
67c8dade

Enhance fused_elementwise_activation op and add python api in contrib.layers (#17236) · 8fd39f3e

由 Yiqun Liu 提交于 5月 30, 2019

* Enhance fused_elementwise_activation op.
test=develop

* Move the api fused_elementwise_activation to contrib.
test=develop

* Add including files.
test=develop

* Add the support of sigmoid in fused_elementwise_activetion op.

* Update API.spec.
test=develop

8fd39f3e

29 5月, 2019 2 次提交
- G
  
  fix 2dconn test=develop (#17681) · 0d561ef4
  由 gongweibao 提交于 5月 29, 2019
  
  0d561ef4
- M
  
  Capi for a ngraph engine (#17037) · 5eb81fe5
  由 mozga-intel 提交于 5月 28, 2019
  
  5eb81fe5
28 5月, 2019 1 次提交

[MKL-DNN] conv_transpose mkldnn bias pass (#17644) · 6d8075ec

由 Jacek Czaja 提交于 5月 28, 2019

* - changes to graph detector

- Changes to pass

- Added ut for new pass

- use_pass

- Added pass to mkldnn passes

- fix to registration

- improved verbose messaging for conv bias passes

- Lint fixes

test=develop

* - Lint fixes

test=develop

6d8075ec

27 5月, 2019 3 次提交

add Concat quantization (#17448) · 96845d21

由 Sylwester Fraczek 提交于 5月 27, 2019

* add Concat quantization
add unit test for quantizing concat
fix for wrong value when the input is not in map of calculated scales
add use_quantizer to concat_op.cc
add scale_algo rules for concat

test=develop

* missing fix for multiple inputs quantize-squash

* wojtuss review fix: adding comment

test=develop

96845d21

G

Add multi-ncclcomm and 2D ncclallreduce support. (#17263) · 65bbf950
由 gongweibao 提交于 5月 27, 2019

65bbf950

Code clean of Allocator (#17602) · 4aa931dd

由 Zeng Jinle 提交于 5月 27, 2019

* Revert "Revert "Fix allocator bug""

This reverts commit 174d0d0b.

* Revert "fix travis ci"

This reverts commit 5656fa9f.

test=develop

* add inlined_vector.h, test=develop

* add inlined_vector_test,test=develop

* clean code of allocator,test=develop

* delete zero_size_allocator.h,test=develop

* fix failed unittest,test=develop

4aa931dd

25 5月, 2019 1 次提交

TRT: Support set dynamic range in int8 mode. (#17524) · 61221ebc

由 Zhaolong Xing 提交于 5月 25, 2019

* fluid int8 train and trt int8 predict align.
trt int8 predict init
op converter

* 2. align fluid int8 train and trt int8 inference.
enhance quant dequant fuse pass
enhance op converter, trt engine, trt engine op, trt subgraph pass.

* 3. add delete_quant_dequant_pass for trt

test=develop

* 4. add the missing file
test=develop

* 5. i modify the c++ interface, but forget to modify the pybind code
fix the IS_TRT_VERSION_GE bug, and fix elementwise op converter
test=develop

61221ebc

24 5月, 2019 5 次提交

[MKL-DNN] Add Fully Connected Op for inference only(#15226) · 0c39b97b

由 Michał Gallus 提交于 5月 24, 2019

* fuse mul and elementwise add to fc

* Reimplement the FC forward operator

* Fix FC MKLDNN integration by transposing weights

* Add FC MKLDNN Pass

test=develop

* FC MKLDNN Pass: change memcpy to std::copy

* Fix MKLDNN FC handling of mismatch input and weights dims

* Lower tolerance for MKL-DNN in resnet50 test

test=develop

* Adjust FC to support MKLDNN Op placement

test=develop

* Adjust Placement Op to set use_mkldnn attribute for graph

test=develop

* MKLDNN FC: fix weights format so that gemm version is called

test=develop

* FC MKLDNN: Remove tolerance decrease from tester_helper

* FC MKL-DNN: Refactor the code, change input reorder to weight reorder

* MKL-DNN FC: Introduce operator caching

test=develop

* FC MKL-DNN: Fix the tensor type in ExpectedKernelType

test=develop

* FC MKL-DNN: fix style changes

test=develop

* FC MKL-DNN: fallback to native on non-supported dim sizes

test=develop

* FC MKLDNN: fix CMake paths

test=develop

* FC MKLDNN: Refine placement pass graph mkldnn attribute

test=develop

* Fix Transpiler error for fuse_conv_eltwise

test=develop

* Fix missing STL includes in files

test=develop

* FC MKL-DNN: Enable new output size computation

Also, refine pass to comply with newest interface.
test=develop

* FC MKL-DNN: enable only when fc_mkldnn_pass is enabled

* FC MKL-DNN: Allow Weights to use oi or io format

* FC MKL-DNN: Adjust UT to work with correct dims

test=develop

* Enable MKL DEBUG for resnet50 analyzer

test=develop

* FC MKL-DNN: Improve Hashing function

test=develop

* FC MKL-DNN: Fix shape for fc weights in transpiler

* FC MKL-DNN: Update input pointer in re-used fc primitive

* Add log for not handling fc fuse for unsupported dims

test=develop

* FC MKL-DNN: Move transpose from pass to Op Kernel

test=develop

* FC MKL-DNN: Disable transpose in unit test

test=develop

* FC MKL-DNN: Remove fc_mkldnn_pass from default list

* Correct Flag for fake data analyzer tests

test=develop

* FC MKL-DNN: Add comment about fc mkldnn pass disablement

test=develop

* FC MKL-DNN: Disable fc in int8 tests

test=develop

0c39b97b

W
add __str__ method for tensor and lodtensor to support print test=dev… (#17588) · 6724a652
由 wopeizl 提交于 5月 24, 2019
```
* add __str__ method for tensor and lodtensor to support print test=develop
```
6724a652

Conv concat relu quantization (#17466) · 5b2a3c4b

由 Sylwester Fraczek 提交于 5月 24, 2019

* add conv_concat_relu fuse

test=develop

* add test code

test=develop

* added missing include with unordered_map

test=develop

* review fixes for wojtuss

test=develop

* remove 'should (not) be fused' comment statements

one of them was invalid anyway

test=develop

5b2a3c4b

fix quantize_squash_pass segfault when no tensor linked to Bias (#17292) · bccb0ba4

由 Sylwester Fraczek 提交于 5月 24, 2019

* fix quantize_squash_pass segfault when there is no tensor linked do Bias input

test=develop

* add googlenet test

test=develop

* fix concat CreateKey not using input format

test=develop

bccb0ba4

G
polish_executor_and_add_ctx_cache (#17536) · 7f8bc49d
由 guru4elephant 提交于 5月 24, 2019
```
* polish_executor_and_add_ctx_cache
```
7f8bc49d

23 5月, 2019 2 次提交

Fix allocator bug (#16712) · c6189637

由 Zeng Jinle 提交于 5月 23, 2019

* Revert "Revert "Fix allocator bug""

This reverts commit 174d0d0b.

* Revert "fix travis ci"

This reverts commit 5656fa9f.

test=develop

* add inlined_vector.h, test=develop

* add inlined_vector_test,test=develop

c6189637

Q
Async exe support communicator (#17386) · 58f7695a
由 Qiao Longfei 提交于 5月 23, 2019
```
Async exe support communicator
```
58f7695a

22 5月, 2019 1 次提交

Enable the convolution/relu6(bounded_relu) fusion for FP32 on Intel platform. (#17130) · 2281ebf0

由 guomingz 提交于 5月 22, 2019

* Relu6 is the bottleneck op for Mobilenet-v2. As the mkldnn supports the conv/relu6 fusion, we implement it fusion via cpass way. Due to the int8 enabling for this fusion will be supported in MKLDNN v0.20, so this PR is focused on the fp32 optimization.

Below table shows the benchmark(FPS) which measured on skx-8180(28 cores)
Batch size | with fusion | without fusion
-- | -- | --
1 | 214.7 | 53.4
50 | 1219.727 | 137.280

test=develop

* Fix the format issue

test=develop

* Add the missing nolint comments.

test=develop

* Fix the typos.

test=develop

* Register the conv_brelu_mkldnn_fuse_pass for the MKLDNN engine.

test=develop

* Adjust the indentation.

test=develop

* Add the test_conv_brelu_mkldnn_fuse_pass case.

test=develop

* Slightly update the code per Baidu comments.
Let the parameter definition embedded into the code.
That's will make the code easy to understand.

test=develop

2281ebf0

PaddlePaddle / PaddleDetection 大约 1 年 前同步成功

PaddlePaddle / PaddleDetection
大约 1 年前同步成功