提交 · 041bc72c55e3f0e9a60a1aee2cfc84ae20f8a342 · BaiXuePrincess / Paddle

19 6月, 2019 2 次提交
- C
  [Cherry Pick] Not init nccl when rank is 1 (#18170) · 041bc72c
  由 chengduo 提交于 6月 19, 2019
```
* remove nccl dep when the number of GPU is 1
test=develop

* use multi card run syncBN
test=release/1.5
```
  041bc72c
- H
  add trainer_desc proto DEPS (#18019) (#18130) · 39002b08
  由 hutuxian 提交于 6月 19, 2019
```
Add trainer_desc proto DEPS to solve CI random fail.
```
  39002b08
17 6月, 2019 1 次提交

Pipeline Concurrency (#17402) (#17971) · a0732cba

由 hutuxian 提交于 6月 17, 2019

cherry-pick for (https://github.com/PaddlePaddle/Paddle/pull/17402)

Add Pipeline Concurrency Train Mode:
- Cpp: pipeline_trainer & section_worker
- Python: PipelineOptimizer
- Add a new data_feed type: PrivateInstantDataFeed
- Add a test demo of pipeline trainer and the test model is gnn
- Do not support win32 now

a0732cba

15 6月, 2019 1 次提交
- C
  [Cherry pick]Update CPU_NUM config (#18110) · be8c82cc
  由 chengduo 提交于 6月 15, 2019
```
* update CPU_NUM config
test=develop
```
  be8c82cc
14 6月, 2019 1 次提交
- G
  
  cherrpick fixncclid 18025 test=release/1.5 (#18093) · 751497db
  由 gongweibao 提交于 6月 14, 2019
  
  751497db
13 6月, 2019 1 次提交
- G
  
  Polish codes of old prs (#17981) · 73eacf3e
  由 gongweibao 提交于 6月 13, 2019
  
  73eacf3e
10 6月, 2019 2 次提交
- Z
  Remove attribute in Allocator::Allocate (#17878) · 3ece61f7
  由 Zeng Jinle 提交于 6月 10, 2019
```
* remove attribute in Allocator::Allocate, test=develop

* fix travis ci error, test=develop
```
  3ece61f7
- G
  
  Fix FLAGS_fuse_parameter_memory_size unit from Bytes to MBytes. (#17924) · 972c54cd
  由 gongweibao 提交于 6月 10, 2019
  
  972c54cd
08 6月, 2019 1 次提交
- G
  
  Fix sync_batch_norm_op ncclallreduce error! (#17918) · dd4cd352
  由 gongweibao 提交于 6月 08, 2019
  
  dd4cd352
06 6月, 2019 2 次提交
- G
  
  Add backward and optimizer operator dependency pass. (#17746) · fbbdc9cc
  由 gongweibao 提交于 6月 06, 2019
  
  fbbdc9cc
- W
  Make ParallelExecutor support Windows GPU (#17787) · 453a49b1
  由 wopeizl 提交于 6月 06, 2019
```
* fix the ParallelExecutor on Windows
test=develop
* restrict to use one GPU only under windows
```
  453a49b1
05 6月, 2019 1 次提交

[NGraph] some ngraph updates to enable bert (#17739) · a4c528a3

由 baojun 提交于 6月 05, 2019

* delay infershape test=develop

* fall back subblock to paddle test=develop

* fix edge cases test=develop

* remove output duplicates test=develop

* handle reshape2_grad infershape test=develop

a4c528a3

04 6月, 2019 2 次提交
- C
  fix DropLocalExeScopes (#17829) · 43752047
  由 chengduo 提交于 6月 04, 2019
```
test=develop
```
  43752047
- L
  enable mkldnn primitive reuse for platform reorder (#17826) · 50326563
  由 Leo Zhao 提交于 6月 04, 2019
```
test=develop
```
  50326563
03 6月, 2019 1 次提交
- C
  polish error doc (#17772) · 863c7516
  由 chengduo 提交于 6月 03, 2019
```
test=develop
```
  863c7516
31 5月, 2019 1 次提交

fix prepare context redundant code problem, optimize executor by cach… (#17743) · d5239109

由 guru4elephant 提交于 5月 31, 2019

* fix prepare context redundant code problem, optimize executor by caching create_varaiables
test=develop

* cache sub_scope, program, var when use_program_cache=True is set

* make fetch_list runable with variables, add more unittest for use_program_cache

d5239109

30 5月, 2019 2 次提交

C
Add Event in ScopeBuffer Executor (#17667) · 67c8dade
由 chengduo 提交于 5月 30, 2019
```
* add event for fast executor and add threads for scopebuffer executor
test=develop
```
67c8dade

Enhance fused_elementwise_activation op and add python api in contrib.layers (#17236) · 8fd39f3e

由 Yiqun Liu 提交于 5月 30, 2019

* Enhance fused_elementwise_activation op.
test=develop

* Move the api fused_elementwise_activation to contrib.
test=develop

* Add including files.
test=develop

* Add the support of sigmoid in fused_elementwise_activetion op.

* Update API.spec.
test=develop

8fd39f3e

29 5月, 2019 2 次提交
- G
  
  fix 2dconn test=develop (#17681) · 0d561ef4
  由 gongweibao 提交于 5月 29, 2019
  
  0d561ef4
- M
  
  Capi for a ngraph engine (#17037) · 5eb81fe5
  由 mozga-intel 提交于 5月 28, 2019
  
  5eb81fe5
28 5月, 2019 1 次提交

[MKL-DNN] conv_transpose mkldnn bias pass (#17644) · 6d8075ec

由 Jacek Czaja 提交于 5月 28, 2019

* - changes to graph detector

- Changes to pass

- Added ut for new pass

- use_pass

- Added pass to mkldnn passes

- fix to registration

- improved verbose messaging for conv bias passes

- Lint fixes

test=develop

* - Lint fixes

test=develop

6d8075ec

27 5月, 2019 3 次提交

add Concat quantization (#17448) · 96845d21

由 Sylwester Fraczek 提交于 5月 27, 2019

* add Concat quantization
add unit test for quantizing concat
fix for wrong value when the input is not in map of calculated scales
add use_quantizer to concat_op.cc
add scale_algo rules for concat

test=develop

* missing fix for multiple inputs quantize-squash

* wojtuss review fix: adding comment

test=develop

96845d21

G

Add multi-ncclcomm and 2D ncclallreduce support. (#17263) · 65bbf950
由 gongweibao 提交于 5月 27, 2019

65bbf950

Code clean of Allocator (#17602) · 4aa931dd

由 Zeng Jinle 提交于 5月 27, 2019

* Revert "Revert "Fix allocator bug""

This reverts commit 174d0d0b.

* Revert "fix travis ci"

This reverts commit 5656fa9f.

test=develop

* add inlined_vector.h, test=develop

* add inlined_vector_test,test=develop

* clean code of allocator,test=develop

* delete zero_size_allocator.h,test=develop

* fix failed unittest,test=develop

4aa931dd

25 5月, 2019 1 次提交

TRT: Support set dynamic range in int8 mode. (#17524) · 61221ebc

由 Zhaolong Xing 提交于 5月 25, 2019

* fluid int8 train and trt int8 predict align.
trt int8 predict init
op converter

* 2. align fluid int8 train and trt int8 inference.
enhance quant dequant fuse pass
enhance op converter, trt engine, trt engine op, trt subgraph pass.

* 3. add delete_quant_dequant_pass for trt

test=develop

* 4. add the missing file
test=develop

* 5. i modify the c++ interface, but forget to modify the pybind code
fix the IS_TRT_VERSION_GE bug, and fix elementwise op converter
test=develop

61221ebc

24 5月, 2019 5 次提交

[MKL-DNN] Add Fully Connected Op for inference only(#15226) · 0c39b97b

由 Michał Gallus 提交于 5月 24, 2019

* fuse mul and elementwise add to fc

* Reimplement the FC forward operator

* Fix FC MKLDNN integration by transposing weights

* Add FC MKLDNN Pass

test=develop

* FC MKLDNN Pass: change memcpy to std::copy

* Fix MKLDNN FC handling of mismatch input and weights dims

* Lower tolerance for MKL-DNN in resnet50 test

test=develop

* Adjust FC to support MKLDNN Op placement

test=develop

* Adjust Placement Op to set use_mkldnn attribute for graph

test=develop

* MKLDNN FC: fix weights format so that gemm version is called

test=develop

* FC MKLDNN: Remove tolerance decrease from tester_helper

* FC MKL-DNN: Refactor the code, change input reorder to weight reorder

* MKL-DNN FC: Introduce operator caching

test=develop

* FC MKL-DNN: Fix the tensor type in ExpectedKernelType

test=develop

* FC MKL-DNN: fix style changes

test=develop

* FC MKL-DNN: fallback to native on non-supported dim sizes

test=develop

* FC MKLDNN: fix CMake paths

test=develop

* FC MKLDNN: Refine placement pass graph mkldnn attribute

test=develop

* Fix Transpiler error for fuse_conv_eltwise

test=develop

* Fix missing STL includes in files

test=develop

* FC MKL-DNN: Enable new output size computation

Also, refine pass to comply with newest interface.
test=develop

* FC MKL-DNN: enable only when fc_mkldnn_pass is enabled

* FC MKL-DNN: Allow Weights to use oi or io format

* FC MKL-DNN: Adjust UT to work with correct dims

test=develop

* Enable MKL DEBUG for resnet50 analyzer

test=develop

* FC MKL-DNN: Improve Hashing function

test=develop

* FC MKL-DNN: Fix shape for fc weights in transpiler

* FC MKL-DNN: Update input pointer in re-used fc primitive

* Add log for not handling fc fuse for unsupported dims

test=develop

* FC MKL-DNN: Move transpose from pass to Op Kernel

test=develop

* FC MKL-DNN: Disable transpose in unit test

test=develop

* FC MKL-DNN: Remove fc_mkldnn_pass from default list

* Correct Flag for fake data analyzer tests

test=develop

* FC MKL-DNN: Add comment about fc mkldnn pass disablement

test=develop

* FC MKL-DNN: Disable fc in int8 tests

test=develop

0c39b97b

W
add __str__ method for tensor and lodtensor to support print test=dev… (#17588) · 6724a652
由 wopeizl 提交于 5月 24, 2019
```
* add __str__ method for tensor and lodtensor to support print test=develop
```
6724a652

Conv concat relu quantization (#17466) · 5b2a3c4b

由 Sylwester Fraczek 提交于 5月 24, 2019

* add conv_concat_relu fuse

test=develop

* add test code

test=develop

* added missing include with unordered_map

test=develop

* review fixes for wojtuss

test=develop

* remove 'should (not) be fused' comment statements

one of them was invalid anyway

test=develop

5b2a3c4b

fix quantize_squash_pass segfault when no tensor linked to Bias (#17292) · bccb0ba4

由 Sylwester Fraczek 提交于 5月 24, 2019

* fix quantize_squash_pass segfault when there is no tensor linked do Bias input

test=develop

* add googlenet test

test=develop

* fix concat CreateKey not using input format

test=develop

bccb0ba4

G
polish_executor_and_add_ctx_cache (#17536) · 7f8bc49d
由 guru4elephant 提交于 5月 24, 2019
```
* polish_executor_and_add_ctx_cache
```
7f8bc49d

23 5月, 2019 2 次提交

Fix allocator bug (#16712) · c6189637

由 Zeng Jinle 提交于 5月 23, 2019

* Revert "Revert "Fix allocator bug""

This reverts commit 174d0d0b.

* Revert "fix travis ci"

This reverts commit 5656fa9f.

test=develop

* add inlined_vector.h, test=develop

* add inlined_vector_test,test=develop

c6189637

Q
Async exe support communicator (#17386) · 58f7695a
由 Qiao Longfei 提交于 5月 23, 2019
```
Async exe support communicator
```
58f7695a

22 5月, 2019 1 次提交

Enable the convolution/relu6(bounded_relu) fusion for FP32 on Intel platform. (#17130) · 2281ebf0

由 guomingz 提交于 5月 22, 2019

* Relu6 is the bottleneck op for Mobilenet-v2. As the mkldnn supports the conv/relu6 fusion, we implement it fusion via cpass way. Due to the int8 enabling for this fusion will be supported in MKLDNN v0.20, so this PR is focused on the fp32 optimization.

Below table shows the benchmark(FPS) which measured on skx-8180(28 cores)
Batch size | with fusion | without fusion
-- | -- | --
1 | 214.7 | 53.4
50 | 1219.727 | 137.280

test=develop

* Fix the format issue

test=develop

* Add the missing nolint comments.

test=develop

* Fix the typos.

test=develop

* Register the conv_brelu_mkldnn_fuse_pass for the MKLDNN engine.

test=develop

* Adjust the indentation.

test=develop

* Add the test_conv_brelu_mkldnn_fuse_pass case.

test=develop

* Slightly update the code per Baidu comments.
Let the parameter definition embedded into the code.
That's will make the code easy to understand.

test=develop

2281ebf0

20 5月, 2019 2 次提交
- L
  
  remove two useless flags: enable_subgraph_optimize, memory_optimize_debug, test=develop (#17491) · c3949f56
  由 liuwei1031 提交于 5月 20, 2019
  
  c3949f56
- T
  remove unused expected_kernel_cache_pass (#17486) · 32da5e9c
  由 Tao Luo 提交于 5月 20, 2019
```
test=develop
```
  32da5e9c
17 5月, 2019 2 次提交
- C
  Add record event And remove CSP (#17447) · 5a6ab380
  由 chengduo 提交于 5月 17, 2019
```
* add record_event
test=develop

* remove csp
test=develop
```
  5a6ab380
- Q
  add cache_update_mutex_ for operator test=develop (#17124) · 728bbaa4
  由 Qiao Longfei 提交于 5月 17, 2019
```
* add cache_update_mutex_ for operator 
```
  728bbaa4
16 5月, 2019 3 次提交
- G
  add inductive shape index (#17435) · 43c9561e
  由 guru4elephant 提交于 5月 16, 2019
```
add inductive shape index
```
  43c9561e
- Z
  
  fix recurrent_op,test=develop (#17433) · 712bfb17
  由 Zeng Jinle 提交于 5月 16, 2019
  
  712bfb17
- T
  Revert "remove unnecessary prepare_data (#17080)" (#17432) · 5babcd02
  由 Tao Luo 提交于 5月 16, 2019
```
This reverts commit aca60e9a.
```
  5babcd02

BaiXuePrincess / Paddle 与 Fork 源项目一致

BaiXuePrincess / Paddle
与 Fork 源项目一致