提交 · d234aa02cd4bcc57f600f7bd18dea1abadc7ae48 · Crayon鑫 / Paddle

02 7月, 2019 2 次提交

rename mkldnn set/get_cur_thread_id() to set/get_cur_mkldnn_session_id() (#18453) · 8f5fffca

由 Leo Zhao 提交于 7月 02, 2019

* rename mkldnn set/get_cur_thread_id() to set/get_cur_mkldnn_session_id()

test=develop

* update session id definition and adjust logic for default behavior

test=develop

* reset logic in mkldnn reuse as most of cases work in default.

test=develop

8f5fffca

supports collective training with programs (#18392) · a873fa84

由 Yi Liu 提交于 7月 02, 2019

1. Since allreduce op has 4 reduce types, We split these four reduce types into four ops
2. We also refined the collective op code, e.g. we separated the collective op kernel into CPUKernel and CUDAKernel, and remove the device specified DeviceContext parameter in template as we already knew the target DeviceContext
3. We remove the newly added Collective op role to reduce the complexity of program and graph analysis

a873fa84

01 7月, 2019 1 次提交

Fix bug in quantize kernel which cause crash in vgg16/19 model (#17964) · 4bc2987d

由 Brian Liu 提交于 7月 01, 2019

* Fix bug in quantize kernel which cause crash in vgg16/19 model

test=develop

* refine the code to reduce verbose code; test=develop

* remove useless code; test=develop

4bc2987d

28 6月, 2019 1 次提交

Fix potential mkldnn concat/pool/conv kernel issues (#18393) · 681d3553

由 Leo Zhao 提交于 6月 28, 2019

1. some key generation method is not aligned with PR#17965
2. enlarge ptr lifetime to avoid memory release if SetBlob fails
   otherwise it will get core dump.

test=develop

681d3553

27 6月, 2019 4 次提交

H
add dependecy of collective_helper (#18365) · 9931bc64
由 HaoRen 提交于 6月 27, 2019
```
* add dependecy of collective_helper

* test=develop
fix dependecy of collective_helper
```
9931bc64
M
Reset DeviceContext after quantization warmup (#18182) · 84096932
由 Michał Gallus 提交于 6月 27, 2019
```
test=develop
```
84096932

supports collective communicated training (#18175) · b7128bac

由 HaoRen 提交于 6月 27, 2019

* fix prepare context redundant code problem, optimize executor by caching create_varaiables
test=develop

* supports collective training in executor

* make fetch_list runable with variables, add more unittest for use_program_cache
test=develop

* fix comment
test=develop

* use unique name for nccl_id

* supports output to stream in program_to_code

* insert sync_comm_stream before regularization; add skip_op_callstack capability in program_to_code

* set op role in collective training

* add collective op role

* remove orig file

* add build optimizer by strategy

* add collective strategy

* refine collective strategy

* add multi-process role maker

* refine strategy building factory so that we can easily plugin more strategy

* scale loss grad in collective sgd transpiler

* add support for distributed fc

* code format

* revert some features for dist fc

* add support for distributed fc training

* fix prepare context redundant code problem, optimize executor by caching create_varaiables
test=develop

* supports collective training in executor

* make fetch_list runable with variables, add more unittest for use_program_cache
test=develop

* use unique name for nccl_id

* supports output to stream in program_to_code

* insert sync_comm_stream before regularization; add skip_op_callstack capability in program_to_code

* set op role in collective training

* add collective op role

* fix comment
test=develop

* remove orig file

* add build optimizer by strategy

* add collective strategy

* refine collective strategy

* add multi-process role maker

* refine strategy building factory so that we can easily plugin more strategy

* scale loss grad in collective sgd transpiler

* add support for distributed fc

* code format

* revert some features for dist fc

* add support for distributed fc training

* test=develop
add collective op unittest standard

* test=develop
remove the test_collective directory

* test=develop
remove the test_collective directory

* remove slicegather test

* code format for reducescatter

* update attr of shard_index_op

* Modify macro nccl_helper

* remove test without distribute

* macro collective_helper

* marcro update

* test=develop
update support python3.5

* test=develop change gpu memory use to 0.1 when test

* test=develop
update ut equal func

* test=develop
set flags to 1.5

* test=develop fix pickle dumple  py35

* test=develop
fix divide in slice and add sync_comm_stream
update atol and rtol to 1e-05
rm shard_index op and test
modify read input from file to read from memory
remove origin_program in framework and add i/o in c_sync_calc_stream

* test=develop update unittest sync operator I/O

b7128bac

[MKL-DNN] Extending reusing to Elementwise_add_mkldnn op (#18146) · c2efdfd5

由 Jacek Czaja 提交于 6月 27, 2019

* - Reusing of reuder used in elementwise_add_mkldnn

- Added MKL-DNN sum prim reusing

test=develop

- Compilation fixes

test=develop

- Yet another compilation fix

test=develop

- Yet another compilation fix

test=develo

- Yet another linking fix

test=develop

- Final compilation fix

test=develop

- lint fixes

test=develop

- Lint fixes

test=develop

* - Fixes after review

test=develop

c2efdfd5

18 6月, 2019 1 次提交
- C
  Remove nccl dep when the number of GPU is 1 (#18158) · 4978db2c
  由 chengduo 提交于 6月 18, 2019
```
* remove nccl dep when the number of GPU is 1
test=develop
```
  4978db2c
14 6月, 2019 1 次提交
- G
  
  Fix reinitialized ncclid error! (#18025) · f5caf344
  由 gongweibao 提交于 6月 14, 2019
  
  f5caf344
11 6月, 2019 2 次提交

[MKL-DNN] Thread-Safety for MKL-DNN reusing Part 1 (#17965) · 84bb45c0

由 Jacek Czaja 提交于 6月 11, 2019

* - removed is_reusing_

* - Added TID to keys for reusing apart from softmax PD

* - compilation fix

* - Yet another compilation fix

* - Batch Norm and Conv adapted

* - Fix to softmax MT

* - Fixes to MT code of MKL-DNN

* - Lint fixes

test=develop

84bb45c0

Pipeline Concurrency (#17402) · 969e6378

由 hutuxian 提交于 6月 11, 2019

Add Pipeline Concurrency Train Mode:
- Cpp: pipeline_trainer & section_worker
- Python: PipelineOptimizer
- Add a new data_feed type: PrivateInstantDataFeed
- Add a test demo of pipeline trainer and the test model is gnn
- Do not support win32 now

969e6378

10 6月, 2019 1 次提交
- Z
  Remove attribute in Allocator::Allocate (#17878) · 3ece61f7
  由 Zeng Jinle 提交于 6月 10, 2019
```
* remove attribute in Allocator::Allocate, test=develop

* fix travis ci error, test=develop
```
  3ece61f7
07 6月, 2019 1 次提交
- Z
  Fix cuda/cudnn version detection error (#17853) · 3925bd81
  由 Zeng Jinle 提交于 6月 07, 2019
```
* fix cuda/cudnn version detection error, test=develop

* fix again, test=develop
```
  3925bd81
05 6月, 2019 1 次提交
- C
  remove InstallFailureSignalHandler (#17828) · d1169afa
  由 chengduo 提交于 6月 05, 2019
```
test=develop
```
  d1169afa
04 6月, 2019 1 次提交
- L
  enable mkldnn primitive reuse for platform reorder (#17826) · 50326563
  由 Leo Zhao 提交于 6月 04, 2019
```
test=develop
```
  50326563
03 6月, 2019 2 次提交
- W
  revise the cudnn conv choose algorithm to improve the performance(mask rcnn benchmark) (#17753) · c10157a5
  由 wangchaochaohu 提交于 6月 03, 2019
```
* revise conv layer cudnn algo choose test=develop

* update for code style test=develop

* update for code style test=develop
```
  c10157a5
- C
  polish error doc (#17772) · 863c7516
  由 chengduo 提交于 6月 03, 2019
```
test=develop
```
  863c7516
29 5月, 2019 1 次提交
- G
  
  fix 2dconn test=develop (#17681) · 0d561ef4
  由 gongweibao 提交于 5月 29, 2019
  
  0d561ef4
27 5月, 2019 1 次提交
- G
  
  Add multi-ncclcomm and 2D ncclallreduce support. (#17263) · 65bbf950
  由 gongweibao 提交于 5月 27, 2019
  
  65bbf950
24 5月, 2019 2 次提交
- W
  add __str__ method for tensor and lodtensor to support print test=dev… (#17588) · 6724a652
  由 wopeizl 提交于 5月 24, 2019
```
* add __str__ method for tensor and lodtensor to support print test=develop
```
  6724a652
- M
  [NGraph] Enable assign operator for a ngraph, test=develop (#17437) · f2694e12
  由 mozga-intel 提交于 5月 23, 2019
```
*  Enable assign operator for a ngraph, test=develop

* Cross_entropy operators needs to be updated
```
  f2694e12
23 5月, 2019 2 次提交

Fix allocator bug (#16712) · c6189637

由 Zeng Jinle 提交于 5月 23, 2019

* Revert "Revert "Fix allocator bug""

This reverts commit 174d0d0b.

* Revert "fix travis ci"

This reverts commit 5656fa9f.

test=develop

* add inlined_vector.h, test=develop

* add inlined_vector_test,test=develop

c6189637

M

[NGraph] Enable reshape operator test=develop (#17512) · 109b5aed
由 mozga-intel 提交于 5月 22, 2019

109b5aed

22 5月, 2019 1 次提交

Enable the convolution/relu6(bounded_relu) fusion for FP32 on Intel platform. (#17130) · 2281ebf0

由 guomingz 提交于 5月 22, 2019

* Relu6 is the bottleneck op for Mobilenet-v2. As the mkldnn supports the conv/relu6 fusion, we implement it fusion via cpass way. Due to the int8 enabling for this fusion will be supported in MKLDNN v0.20, so this PR is focused on the fp32 optimization.

Below table shows the benchmark(FPS) which measured on skx-8180(28 cores)
Batch size | with fusion | without fusion
-- | -- | --
1 | 214.7 | 53.4
50 | 1219.727 | 137.280

test=develop

* Fix the format issue

test=develop

* Add the missing nolint comments.

test=develop

* Fix the typos.

test=develop

* Register the conv_brelu_mkldnn_fuse_pass for the MKLDNN engine.

test=develop

* Adjust the indentation.

test=develop

* Add the test_conv_brelu_mkldnn_fuse_pass case.

test=develop

* Slightly update the code per Baidu comments.
Let the parameter definition embedded into the code.
That's will make the code easy to understand.

test=develop

2281ebf0

20 5月, 2019 1 次提交
- Q
  Fix compiling error with cuDNN 5.1 (#17458) · 97f0ec23
  由 qingqing01 提交于 5月 20, 2019
```
test=develop
```
  97f0ec23
15 5月, 2019 1 次提交
- Z
  
  fix_dygraph_mem_leak, test=develop (#17396) · eab34b2d
  由 Zeng Jinle 提交于 5月 15, 2019
  
  eab34b2d
10 5月, 2019 1 次提交

Double backward of conv2d. (#17211) · e32c9888

由 qingqing01 提交于 5月 10, 2019

* Add conv2d_grad_grad_op
* Extracte the cuDNN conv algo searching code in conv_cudnn_helper.h.
    - Now use it in conv2d_grad_grad.
    - Will simply the searching code in conv2d and conv2d_grad in next PR.
* Enhance and fix bug in unit testing of gradient_checker.
* Support to fetch empty variables，return None in Python.

e32c9888

08 5月, 2019 3 次提交

Refine elementwise kernel. (#16952) · 792443ef

由 zhaoyuchen2018 提交于 5月 08, 2019

* Refine elementwise kernel.

Add a simple cuda kernel if grad x and y both exist
Use 2D block cuda kernel to do broadcast.

test=develop
Signed-off-by: Nzhaoyuchen <zhaoyuchen01@baidu.com>

* refine code.

test=develop
Signed-off-by: Nzhaoyuchen <zhaoyuchen01@baidu.com>

* refine code.

test=develop
Signed-off-by: Nzhaoyuchen <zhaoyuchen01@baidu.com>

792443ef

C
update assert (#17282) · db5e74ab
由 chengduo 提交于 5月 08, 2019
```
test=develop
```
db5e74ab

Adding lrn op for ngraph engine (#17189) · 7bd1d03e

由 baojun 提交于 5月 07, 2019

* added lrn op test=develop

* Added CreateConstant method test=develop

* avoid duplicates test=develop

7bd1d03e

07 5月, 2019 1 次提交
- T
  remove unused FLAGS_warpctc_dir (#17162) · ff1661f1
  由 Tao Luo 提交于 5月 07, 2019
```
* remove unused FLAGS_warpctc_dir

test=develop

* remove FLAGS_warpctc_dir

test=develop
```
  ff1661f1
30 4月, 2019 1 次提交
- H
  Fix a typo in gpu_info.cc (#17175) · e4a53324
  由 Huihuang Zheng 提交于 4月 30, 2019
```
test=develop
```
  e4a53324
28 4月, 2019 1 次提交

Use CudnnWorkspaceHandle in exhaustive search (#17082) · b9494058

由 Huihuang Zheng 提交于 4月 28, 2019

1. Use CudnnWorkspaceHandle in exhaustive search of conv_cudnn.
2. For Ops using CudnnWorkspaceHandle in exhaustive search, release their GPU memory after exhaustive search.

test=develop

b9494058

23 4月, 2019 1 次提交
- Z
  Make conv cudnn workspace size configurable (#17036) · 0c335dcd
  由 Zeng Jinle 提交于 4月 23, 2019
```
* make_conv_cudnn_ws_size_configurable, test=develop

* change std::max to std::min
test=develop
```
  0c335dcd
21 4月, 2019 1 次提交

Refine model gpu memory (#16993) · 1202d3fc

由 Zeng Jinle 提交于 4月 21, 2019

* speedup gc and inplace softmax_with_cross_entropy_grad
test=develop

* refine models gpu mem
Merge skip vars and warning messages of mem opt
remove relu mem opt
test=develop

* follow comments
test=develop

1202d3fc

18 4月, 2019 1 次提交
- G
  
  Polish DGC code (#16818) · cbdb8a17
  由 gongweibao 提交于 4月 18, 2019
  
  cbdb8a17
16 4月, 2019 2 次提交

X
fix infershape bug · 5663fbfb
由 xuezhong 提交于 4月 16, 2019
```
test=develop
```
5663fbfb

[MKL-DNN] Added reusing of primitive descriptors (fp32) (#16667) · 87a44b11

由 Jacek Czaja 提交于 4月 15, 2019

* - Reuse of conv PD

- conv transpose pd reused

- Added PD reusing of softmax and Batch Norm

- Refactoring and removal of not needed routines of mkl-dnn ops

test=develop

- Fix to reusing conv

test=develop

- Lint fixes

test=develop

- Further lint fixes

test=develop

- Lint  fixes

test=develop

- lint fixes

test=develop

- Lint workaround

test=develop

* - Fix after review on including boost as third party header

test=develop

* - Fix after review. Name change to something more descriptive

test=develop

87a44b11

11 4月, 2019 1 次提交
- D
  make lodtensor_printer usable in gpu setting · a659b37a
  由 dongdaxiang 提交于 4月 11, 2019
```
test=develop
```
  a659b37a

Crayon鑫 / Paddle 与 Fork 源项目一致

Crayon鑫 / Paddle
与 Fork 源项目一致