提交 · 89bc3fd841631a26b38fe424107688728493527f · BaiXuePrincess / Paddle

19 7月, 2019 2 次提交

Support memory eager deletion on recurrent OP (#17710) · 89bc3fd8

由 Huihuang Zheng 提交于 7月 19, 2019

Test PaddingRNN on V100 GPU device.

Test configuration: large model, padding mode (which is the mode using recurrentOp), one GPU.

GPU memory (MiB): 6414 (this PR) vs 6837 (without this PR)
Speed (steps/s): 10.28 (this PR) vs 9.89 (without this PR)

89bc3fd8

A
Add LeakyRelu MKLDNN support (#18656) · d6b6a337
由 Adam 提交于 7月 19, 2019
```
test=develop
```
d6b6a337

18 7月, 2019 2 次提交
- H
  hash_op support int64 hash_size (#18674) · bb2f5d24
  由 hutuxian 提交于 7月 18, 2019
```
* hash_op support int64 hash_size
* add corresponding UT
```
  bb2f5d24
- G
  remove ctr reader, all functions are satisfied in dataset (#18672) · 5ed713d5
  由 guru4elephant 提交于 7月 18, 2019
```
* remove ctr reader, all functions are satisfied in dataset
```
  5ed713d5
17 7月, 2019 3 次提交
- Y
  Add cuda implementation for `prelu` backward pass (#18633) · ce1ec332
  由 Yang Zhang 提交于 7月 17, 2019
```
* Add GPU implementation for `prelu` backward pass

test=develop

* Fix logic error in `prelu` GPU backward and simplify a bit

test=develop

* Fix `prelu` backward CUDA implementation

test=develop

CPU version was not used actually, so test passed
```
  ce1ec332
- Y
  
  [CPU] Fix the compiling issue with AVX512F macro. (#18634) · 97549a4f
  由 Yihua Xu 提交于 7月 17, 2019
  
  97549a4f
- B
  
  [NGraph] handle dim element 0 of ngraph op (#18568) · 256ba7cb
  由 baojun 提交于 7月 16, 2019
  
  256ba7cb
16 7月, 2019 2 次提交

[MKL-DNN] Reimplemented pool2d mkl-dnn to use Acquire API (#18585) · 71d883b8

由 Jacek Czaja 提交于 7月 16, 2019

* - Added partial draft of pooling acquire

- Workspace support

- compilation fix

- Added draft of pooling backward reimplementation

- Segfault fix

- reverted 'any' for diff_dst crewation in pooling

- Lint fixes

test=develop

- lint fixes

test=develop

- Further lint fixes

test=develop

* - Fixes after review

test=develop

* - Lint fixes

test=develop

* - Even more lint fixes

test=develop

71d883b8

C
fix bug of scatter op (#18640) · f4ec7d54
由 chengduo 提交于 7月 16, 2019
```
test=develop
```
f4ec7d54

15 7月, 2019 1 次提交
- G
  make auc op compatible with 1 dim (#18551) · ab57d389
  由 guru4elephant 提交于 7月 15, 2019
```
* make auc op compatible with 1 dim
```
  ab57d389
11 7月, 2019 2 次提交

H

fix cudnn lstm shape bug; test=develop (#18492) · a20b2b43
由 Hongyu Liu 提交于 7月 11, 2019

a20b2b43

Feature/buffer_shared_inplace (#17911) · d3003a16

由 Zeng Jinle 提交于 7月 11, 2019

* feature/buffer_shared_inplace, test=develop

* refine code, test=develop

* fix elementwise_add op cpu inplace and sum inplace bug, test=develop

* add unittest and debug log, test=develop

* fix parallel_executor scope bug, polish code, test=develop

* fix sum op, activation op, single_in_place_inference bug, test=develop

* remove kLocalExecScopeName, test=develop

* fix unittest,test=develop

* fix out_var first version bug, test=develop

* follow comments,test=develop

d3003a16

10 7月, 2019 4 次提交
- Z
  Clean unused code of dim and place (#18565) · be24e5b3
  由 Zeng Jinle 提交于 7月 10, 2019
```
* clean code of dim and place, test=develop

* fix failed unittests, test=develop
```
  be24e5b3
- J
  
  Activations MKLDNN ops refactoring (#18191) · 8869d7f7
  由 Jacek Czaja 提交于 7月 10, 2019
  
  8869d7f7
- Y
  
  Register fp16 for concat_op (#18563) · b86234fc
  由 Yibing Liu 提交于 7月 10, 2019
  
  b86234fc
- P
  
  fix compile error which caused by gcc4.8 related commit;test=develop (#18567) · 5e1220ef
  由 Physher 提交于 7月 10, 2019
  
  5e1220ef
09 7月, 2019 3 次提交
- J
  Fix/gcc 4.8 ubt link error (#18558) · 667f88f9
  由 Jiabin Yang 提交于 7月 09, 2019
```
* test=develop, fix docker with paddle nccl problem

* test=develop, fix/gcc_4.8_ubt_link_error

* test=develop, fix code format
```
  667f88f9
- P
  
  Add mkldnn int8 mul-op kernel (#17834) · 0caa08ea
  由 Physher 提交于 7月 09, 2019
  
  0caa08ea
- L
  Fix roi_perspective_transform_op bug (#18522) · 24d1c44a
  由 LielinJiang 提交于 7月 09, 2019
```
* fix transform matrix bug, test=develop

* modify API.spec
```
  24d1c44a
08 7月, 2019 1 次提交

Inference: fix mask rcnn model diff, optim memory usage, memory leak. (#18532) · 88b52a27

由 Zhaolong Xing 提交于 7月 08, 2019

* Fix Mask rcnn predictor
    1. refine memory optim algorithm to support the model with the block op.
    2. output diff : modify the affine channel fuse
    3. add condition_block_infer op
add interface for setting trt calib table dir
test=develop

* add the missing files.
test=develop

88b52a27

05 7月, 2019 1 次提交

Fix topk cannot handle 1D vector bug (#18466) · 832d8191

由 zhaoyuchen2018 提交于 7月 05, 2019

* Fix topk cannot handle 1D vector bug

Add path to handle 1D vector

test=develop
Signed-off-by: Nzhaoyuchen <zhaoyuchen01@baidu.com>

* refine code

test=develop
Signed-off-by: Nzhaoyuchen <zhaoyuchen01@baidu.com>

832d8191

04 7月, 2019 2 次提交
- Q
  Refine Infershape in activation_op for double_grad. (#18485) · 7ac4818a
  由 qingqing01 提交于 7月 04, 2019
```
* Refine Infershape in activation_op for double_grad.
```
  7ac4818a
- C
  
  Make fuse_all_reduce_op_pass support mix_precision (#17652) · 74538573
  由 chengduo 提交于 7月 04, 2019
  
  74538573
03 7月, 2019 6 次提交
- Z
  
  support Tensor input for edit_distance op (#18162) · 7c6f2350
  由 zhoukunsheng 提交于 7月 03, 2019
  
  7c6f2350
- Z
  support Tensor input for chunk_eval op (#18226) · 26318544
  由 zhoukunsheng 提交于 7月 03, 2019
```
* test=develop
support Tensor input for chunk_eval op

* test=develop
fix testcase for chunk_eval op

* test=develop
fix typos in nn.py
```
  26318544
- Z
  
  add unique kernel and op (#17557) · 206c44e2
  由 zhoukunsheng 提交于 7月 03, 2019
  
  206c44e2
- Z
  
  upgrade hash op to support Tensor and LoDTensor input (#17998) · 71af72b1
  由 zhoukunsheng 提交于 7月 03, 2019
  
  71af72b1
- Z
  
  add ones_like op (#17388) · d3b3443d
  由 zhoukunsheng 提交于 7月 03, 2019
  
  d3b3443d
- Z
  
  add size op (#17412) · 67b48d7f
  由 zhoukunsheng 提交于 7月 03, 2019
  
  67b48d7f
02 7月, 2019 3 次提交

rename mkldnn set/get_cur_thread_id() to set/get_cur_mkldnn_session_id() (#18453) · 8f5fffca

由 Leo Zhao 提交于 7月 02, 2019

* rename mkldnn set/get_cur_thread_id() to set/get_cur_mkldnn_session_id()

test=develop

* update session id definition and adjust logic for default behavior

test=develop

* reset logic in mkldnn reuse as most of cases work in default.

test=develop

8f5fffca

supports collective training with programs (#18392) · a873fa84

由 Yi Liu 提交于 7月 02, 2019

1. Since allreduce op has 4 reduce types, We split these four reduce types into four ops
2. We also refined the collective op code, e.g. we separated the collective op kernel into CPUKernel and CUDAKernel, and remove the device specified DeviceContext parameter in template as we already knew the target DeviceContext
3. We remove the newly added Collective op role to reduce the complexity of program and graph analysis

a873fa84

C
Add find_no_grad_vars in backward.py (#17942) · e0d8c6ac
由 chengduo 提交于 7月 02, 2019
```
* add not_been_used_vars to no_grad_set
test=develop
```
e0d8c6ac

01 7月, 2019 2 次提交

Make roi_perspective_transform op return mask and transform matrix (#18371) · 449c7a9f

由 LielinJiang 提交于 7月 01, 2019

* modify roi_perspective_transform_op to output mask and transform matrix

* modify comment

* modify comment

* modify API.spec

* update API.spec

* remove no use header, test=develop

* resolve conflict

449c7a9f

Fix bug in quantize kernel which cause crash in vgg16/19 model (#17964) · 4bc2987d

由 Brian Liu 提交于 7月 01, 2019

* Fix bug in quantize kernel which cause crash in vgg16/19 model

test=develop

* refine the code to reduce verbose code; test=develop

* remove useless code; test=develop

4bc2987d

28 6月, 2019 2 次提交

Fix potential mkldnn concat/pool/conv kernel issues (#18393) · 681d3553

由 Leo Zhao 提交于 6月 28, 2019

1. some key generation method is not aligned with PR#17965
2. enlarge ptr lifetime to avoid memory release if SetBlob fails
   otherwise it will get core dump.

test=develop

681d3553

Z
Add a unittest to inplace elementwise_add (#18385) · f5641000
由 Zeng Jinle 提交于 6月 28, 2019
```
* add_elementwise_add_inplace_test,test=develop

* rename file, test=develop
```
f5641000

27 6月, 2019 4 次提交

T
fix communicator with pyreader (#18350) · 999d9a59
由 tangwei12 提交于 6月 27, 2019
```
* add is_runnning in communicator, test=develop
```
999d9a59

supports collective communicated training (#18175) · b7128bac

由 HaoRen 提交于 6月 27, 2019

* fix prepare context redundant code problem, optimize executor by caching create_varaiables
test=develop

* supports collective training in executor

* make fetch_list runable with variables, add more unittest for use_program_cache
test=develop

* fix comment
test=develop

* use unique name for nccl_id

* supports output to stream in program_to_code

* insert sync_comm_stream before regularization; add skip_op_callstack capability in program_to_code

* set op role in collective training

* add collective op role

* remove orig file

* add build optimizer by strategy

* add collective strategy

* refine collective strategy

* add multi-process role maker

* refine strategy building factory so that we can easily plugin more strategy

* scale loss grad in collective sgd transpiler

* add support for distributed fc

* code format

* revert some features for dist fc

* add support for distributed fc training

* fix prepare context redundant code problem, optimize executor by caching create_varaiables
test=develop

* supports collective training in executor

* make fetch_list runable with variables, add more unittest for use_program_cache
test=develop

* use unique name for nccl_id

* supports output to stream in program_to_code

* insert sync_comm_stream before regularization; add skip_op_callstack capability in program_to_code

* set op role in collective training

* add collective op role

* fix comment
test=develop

* remove orig file

* add build optimizer by strategy

* add collective strategy

* refine collective strategy

* add multi-process role maker

* refine strategy building factory so that we can easily plugin more strategy

* scale loss grad in collective sgd transpiler

* add support for distributed fc

* code format

* revert some features for dist fc

* add support for distributed fc training

* test=develop
add collective op unittest standard

* test=develop
remove the test_collective directory

* test=develop
remove the test_collective directory

* remove slicegather test

* code format for reducescatter

* update attr of shard_index_op

* Modify macro nccl_helper

* remove test without distribute

* macro collective_helper

* marcro update

* test=develop
update support python3.5

* test=develop change gpu memory use to 0.1 when test

* test=develop
update ut equal func

* test=develop
set flags to 1.5

* test=develop fix pickle dumple  py35

* test=develop
fix divide in slice and add sync_comm_stream
update atol and rtol to 1e-05
rm shard_index op and test
modify read input from file to read from memory
remove origin_program in framework and add i/o in c_sync_calc_stream

* test=develop update unittest sync operator I/O

b7128bac

S
add int8 mkldnn prior_box (#17242) · 9252e8fa
由 Sylwester Fraczek 提交于 6月 27, 2019
```
add prior_box quantization code

add scale algo rules for prior box

test=develop
```
9252e8fa

[MKL-DNN] Extending reusing to Elementwise_add_mkldnn op (#18146) · c2efdfd5

由 Jacek Czaja 提交于 6月 27, 2019

* - Reusing of reuder used in elementwise_add_mkldnn

- Added MKL-DNN sum prim reusing

test=develop

- Compilation fixes

test=develop

- Yet another compilation fix

test=develop

- Yet another compilation fix

test=develo

- Yet another linking fix

test=develop

- Final compilation fix

test=develop

- lint fixes

test=develop

- Lint fixes

test=develop

* - Fixes after review

test=develop

c2efdfd5

BaiXuePrincess / Paddle 与 Fork 源项目一致

BaiXuePrincess / Paddle
与 Fork 源项目一致