提交 · e117114289768c26467dda984e6dd7f9e23bf890 · Crayon鑫 / Paddle

20 9月, 2019 2 次提交

Set states of recurrent op as dependent vars in prune (#19865) · e1171142

由 Huihuang Zheng 提交于 9月 20, 2019

* Set states of recurrent op as dependent vars in prune of save inference model

This PR will fix the save/load inference model problem of RNN models.

The reason of the bug is that save_inferenc_model will prune OPs that doesn't contribute to Output. But in recurrent_op, States are not Output, OPs refers States will be pruned. 

This fix adds States of recurrent_op as dependent var so that OPs referring States won't be pruned.

e1171142

Z

fix reduce and broadcast to avoid multi-stream, test=develop (#19889) · b754700f
由 Zeng Jinle 提交于 9月 20, 2019

b754700f

19 9月, 2019 4 次提交

J
Fix conv2d+dequantize squash for residual fusion (#19545) · 3f1d0234
由 joanna.wozna.intel 提交于 9月 19, 2019
```
* Fix conv2d+dequantize squash for residual fusion

test=develop

* Change condition

test=develop
```
3f1d0234
H
Fix deps of prune (#19876) · a35557d8
由 Huihuang Zheng 提交于 9月 19, 2019
```
Add boost as dependency of prune

fix #19862
```
a35557d8
L

fix SplitLodTensor when batch_size = 0, test=develop (#19866) · 578a2f5d
由 Leo Chen 提交于 9月 19, 2019

578a2f5d

Add a pass to fuse fc+elementwise_add+layernorm (#19776) · 3cd985a6

由 Yiqun Liu 提交于 9月 19, 2019

* Add fc_elementwise_layernorm_fuse pass and unittest.

* Add fused_fc_elementwise_layernorm op and its GPU kernel.
test=develop

* Apply fc_elementwise_layernorm_fuse_pass to GPU inference.

* Add the setting of attrs in the definition of binary_op.
test=develop

* Add comment.

* Implement the unittest.
test=develop

* Change the unittest name of layer_norm.
test=develop

3cd985a6

18 9月, 2019 3 次提交
- Z
  
  refine executor_gc_helper codes, test=develop (#19814) · 3f87464e
  由 Zeng Jinle 提交于 9月 18, 2019
  
  3f87464e
- Z
  
  fix gc bug in controlflow ops, test=develop (#19827) · 3fd3b663
  由 Zeng Jinle 提交于 9月 18, 2019
  
  3fd3b663
- Z
  [Bug fix] Disable memory reuse on feeded variables (#19835) · db26de83
  由 Zeng Jinle 提交于 9月 18, 2019
```
* fix memory reuse bug on feeding variables, test=develop

* add comments to reference count members, test=develop
```
  db26de83
17 9月, 2019 4 次提交

T
rm return in vfork (#19734) · 40c66f8d
由 Thunderbrook 提交于 9月 17, 2019
```
* rm return in vfork

* rm return in vfork
test=develop
```
40c66f8d
X
support preload thread, optimize hdfs log, fix master+patch bug (#19695) · 6bf298bf
由 xujiaqi01 提交于 9月 17, 2019
```
* support preload thread
* sleep before fleet wrapper exit for pslib core dump
* optimize hdfs log
* fix master+patch bug
```
6bf298bf

Feature/add transform data dygraph (#19707) · cc311bdf

由 Jiabin Yang 提交于 9月 17, 2019

* refactor dygraph,test=develop

* fix failed unittest,test=develop

* polish code,test=develop

* check windows ci error,test=develop
try to fix windows ci error by np.allclose,test=develop

* polish vlog and profiler, test=develop

* try to fix preceding ops order,test=develop

* test transformer in windows ci, test=develop

* use python c-api to speed up tracer.trace,test=develop

* test=develop, fix docker with paddle nccl problem

* test=develop, add ut for debug string and gradient_accumulator

* test=develop, add tests for layer/gradient_accumulator/prepared_op

* test=develop, fix complie error for test_prepared_op

* test=develop, add more ut for dygraph

* test=develop, create API.spec for dygraph api change

* add transform_data to dygraph

* test=develop, refoctor name to make it easier to understand

* test=develop, refoctor name to make it easier to understand

* add test and change input to const ref for safety

* test=develop, fix multi-gpu failed problem , add Tracer tests, change PADDLEENFORCE to PADDLEENFORCE_EQ

* add ut for data transform

* refine ut for data_transform

* test=develop, fix ut failed on parallel se-resnext

* test=develop, change one more PADDLE_ENFORCE

* add test_tracer on multiple devices

* test=develop, change place to mutable for data transform

* test=develop, add transform data on same place test and remove useless log

* test=develop, Add to do for data layout and and ut for conv2d with no bias

cc311bdf

Z

disable memory optimization passes when FLAGS_use_ngraph=True, test=develop (#19778) · 754fd57e
由 Zeng Jinle 提交于 9月 17, 2019

754fd57e

16 9月, 2019 3 次提交

C
Fix warning info of build_strategy (#19805) · 82814970
由 chengduo 提交于 9月 16, 2019
```
* fix warning info
test=develop

* fix bug of all_reduce_deps_pass
test=develop
```
82814970

Enhance fc_fuse_pass to enable fusing relu to fc_op (#19733) · c67c8758

由 Yiqun Liu 提交于 9月 16, 2019

* Refine the codes related to fc op.

* Add GPU implementation for fc functor.

* Apply fc_fuse_pass in GPU inference.
test=develop

* Change the cmake for fc op.

* Change PADDLE_ENFORCE to PADDLE_ENFORCE_EQ.

* Add an attribute to set the activation type in fc_op.

* Enhance the unittest of fc_op.
test=develop

* Remove the declaration of FCOpGrad back to the header file.
test=develop

* Set default value for newly added arguments in test_fc_op.
test=develop

* Enhance fc_fuse_pass to enable fusing relu.

* Allow print the shapes of var_desc in graph.
test=develop

* Enhance fc_fuse_pass_tester.

* Remove the use of PADDLE_ENFORCE.
test=develop

* Correct the number of ops after fusing.
test=develop

* Fix a typo.
test=develop

* Set activation_type to null when there is no relu in fc.
test=develop

* Refine fc_fuse_pass's codes.

* Enable the set of shape for tensor.

* Refine repeated_fc_relu_pass and add unittest.
test=develop

c67c8758

C

Add prune_backward function to cover complicated test_program.clone situation (#19772) · 00d5375e
由 Chen Weihang 提交于 9月 16, 2019

00d5375e

14 9月, 2019 1 次提交
- A
  Add common CreateKey for mkldnn handlers (#19767) · d4413a54
  由 Adam 提交于 9月 14, 2019
```
test=develop
```
  d4413a54
13 9月, 2019 1 次提交

Open fuse all reduce option (#19765) · 056fdedd

由 chengduo 提交于 9月 13, 2019

* Open fuse all reduce op
test=develop

* Add Fuse optimization op log

* Add log in fuse_optimizer op pass and fuse all_reduce op pass

* replace with boost::optional<bool>
test=develop

* Polish code
test=develop

* fix code coverage
test=develop

056fdedd

11 9月, 2019 6 次提交

Replace TemporaryAllocator by CUDADeviceContextAllocator (#18989) · 12542320

由 Huihuang Zheng 提交于 9月 11, 2019

TemporaryAllocator is a singleton used for allocating memory for Cudnn. Since it is a singleton, we can delete it for better performance in memory.

We replace TemporaryAllocator by CUDADeviceContextAllocator and CUDADeviceContextAllocation, which uses stream callback to delete the memory allocated for the stream to avoid singleton.

Also added data_feed_proto to operator to fix CI in CPU compilation

12542320

Make leaky relu inplacable (#19676) · 0daa5c97

由 Zeng Jinle 提交于 9月 11, 2019

* make leaky relu inplacable, test=develop

* force add unittests to pass coverage, test=develop

0daa5c97

C
Open fuse broadcast option (#18833) · e506c99c
由 chengduo 提交于 9月 11, 2019
```
* fix vlog level and fuse option type
test=develop
```
e506c99c

Implement the GPU kernel of fc operator (#19687) · a65c728e

由 Yiqun Liu 提交于 9月 11, 2019

* Refine the codes related to fc op.

* Add GPU implementation for fc functor.

* Apply fc_fuse_pass in GPU inference.
test=develop

* Change the cmake for fc op.

* Change PADDLE_ENFORCE to PADDLE_ENFORCE_EQ.

* Add an attribute to set the activation type in fc_op.

* Enhance the unittest of fc_op.
test=develop

* Remove the declaration of FCOpGrad back to the header file.
test=develop

* Set default value for newly added arguments in test_fc_op.
test=develop

a65c728e

C
Enable fused_all_reduce_op_handle support GPU and CPU Gradients (#19418) · 5866a7a5
由 chengduo 提交于 9月 11, 2019
```
* Enable fused_all_reduce_op_handle support GPU and CPU Gradients
```
5866a7a5
T
paddle::framework::vectorize() templatization (#19730) · ec9bc1bd
由 Tao Luo 提交于 9月 11, 2019
```
remove unused accuracy-diff warpctc-cudnn implementation

test=develop
```
ec9bc1bd

10 9月, 2019 2 次提交

Z

add logs to left var memory size, test=develop (#19722) · bb4f8dee
由 Zeng Jinle 提交于 9月 10, 2019

bb4f8dee

merge empty lod tensor, test=develop (#19228) · 25dcd74d

由 wangguanzhong 提交于 9月 10, 2019

* merge_empty_lod_tensor, test=develop

* fix multiclass_nms, test=develop

* refine API.spec, test=develop

* add unittest case for fetch, test=develop

* add lod tensor test, test=develop

* return index for multiclass_nms, test=develop

* add api for multiclass_nms2

* update API.spc, test=develop

* refine api doc, test=develop

* fix test_detection.py, test=develop

* polish code, test=develop

* add more unittest case, test=develop

25dcd74d

09 9月, 2019 1 次提交
- Z
  
  refine tensor.mutable_data, test=develop (#19680) · 713c05dd
  由 Zeng Jinle 提交于 9月 09, 2019
  
  713c05dd
08 9月, 2019 1 次提交
- H
  fix cmakelist deps (#19668) · 1ca6ea03
  由 hutuxian 提交于 9月 08, 2019
```
fix cmakelist deps: remove unnecessary deps and add proper op deps
```
  1ca6ea03
07 9月, 2019 1 次提交

remove -Wmaybe-uninitialized warning (#19653) · bcddbc78

由 Tao Luo 提交于 9月 07, 2019

* remove -Wmaybe-uninitialized warning

test=develop

* remove uninitialized op_handle_ in scale_loss_grad_op_handle.cc

test=develop

bcddbc78

06 9月, 2019 1 次提交
- W
  codegen for fused elementwise operation (#19520) · ed8f44ea
  由 wangchaochaohu 提交于 9月 06, 2019
```
* test=develop codegen for fused elementwise operation

* fix test=develop
```
  ed8f44ea
05 9月, 2019 3 次提交
- M
  add feed_var_names to Prune interface (#19589) · dca9b6c5
  由 mapingshuo 提交于 9月 05, 2019
```
* Fix bug: add feed_vars to the prune function
```
  dca9b6c5
- T
  unify PADDLE_ASSERT_MSG into PADDLE_ENFORCE(error_message) (#19631) · 3ae939e4
  由 Tao Luo 提交于 9月 05, 2019
```
* remove assert.h

* change PADDLE_ASSERT_MSG to PADDLE_ENFORCE

test=develop

* fix tensorrt paddle_enforce

test=develop
```
  3ae939e4
- T
  
  fix scope lock bug on infer (#19624) · e3e98ed6
  由 tensor-tang 提交于 9月 05, 2019
  
  e3e98ed6
04 9月, 2019 3 次提交
- T
  refine some PADDLE_ENFORCE codes for unify PADDLE_ASSERT_MSG (#19607) · 0a46d345
  由 Tao Luo 提交于 9月 04, 2019
```
test=develop
```
  0a46d345
- B
  Enable ngraph through build_strategy (#19266) · a3a4b6e5
  由 baojun 提交于 9月 04, 2019
```
* enable ngraph throught build_strategy test=develop

* add unittest test=develop

* put use_ngraph unconditional test=develop

* remove paddle_enforce test=develop

* remove paddle_enforce test=develop

* fix copyright test=develop

* limit for ngraph only test=develop
```
  a3a4b6e5
- A
  paddle::framework::vectorize() templatization (#19611) · 8d6d95cc
  由 Adam 提交于 9月 04, 2019
```
test=develop
```
  8d6d95cc
03 9月, 2019 4 次提交

T
refine PADDLE_ENFORCE codes for unify PADDLE_ASSERT_MSG (#19603) · 75d15719
由 Tao Luo 提交于 9月 03, 2019
```
test=develop
```
75d15719

A a pass to enable the use of cudnn (#19346) · c5548178

由 Yiqun Liu 提交于 9月 03, 2019

* Add a interface to enable cudnn for inference.

* Add cudnn_placement_pass.
test=develop

* Set the default value of cudnn_enabled_op_types to null.
test=develop

* Write the common basic class, placement_pass_base, to refine the codes.
test=develop

* Call EnableCUDNN in unittest.
test=develop

* Refine cudnn_placement_pass tester.

* Enable the testing of cudnn_placement_pass in inference's unittest.
test=develop

* Add the check of op kernels.
test=develop

c5548178

A
using MKLDNNMemoryFormat = mkldnn::memory::format changes (#19568) · e94b26da
由 Adam 提交于 9月 03, 2019
```
* using MKLDNNMemoryFormat = mkldnn::memory::format changes
test=develop

* PADDLE_ENFORCE update
test=develop
```
e94b26da
G
Change backward_guard to optimize_guard to maximize the allreduce overlap. (#19506) · abaf87be
由 gongweibao 提交于 9月 03, 2019
```
Change backward_guard to optimize_guard to maximize the allreduce overlap
```
abaf87be

Crayon鑫 / Paddle 与 Fork 源项目一致

Crayon鑫 / Paddle
与 Fork 源项目一致