提交 · a0d80754c54c0e5f26085ff9274a87670a58fdb4 · 机器未来 / Paddle

17 9月, 2019 5 次提交

H

Add comments for CUDA Device Context Allocator related stuff (#19809) · a0d80754
由 Huihuang Zheng 提交于 9月 17, 2019

a0d80754

Feature/add transform data dygraph (#19707) · cc311bdf

由 Jiabin Yang 提交于 9月 17, 2019

* refactor dygraph,test=develop

* fix failed unittest,test=develop

* polish code,test=develop

* check windows ci error,test=develop
try to fix windows ci error by np.allclose,test=develop

* polish vlog and profiler, test=develop

* try to fix preceding ops order,test=develop

* test transformer in windows ci, test=develop

* use python c-api to speed up tracer.trace,test=develop

* test=develop, fix docker with paddle nccl problem

* test=develop, add ut for debug string and gradient_accumulator

* test=develop, add tests for layer/gradient_accumulator/prepared_op

* test=develop, fix complie error for test_prepared_op

* test=develop, add more ut for dygraph

* test=develop, create API.spec for dygraph api change

* add transform_data to dygraph

* test=develop, refoctor name to make it easier to understand

* test=develop, refoctor name to make it easier to understand

* add test and change input to const ref for safety

* test=develop, fix multi-gpu failed problem , add Tracer tests, change PADDLEENFORCE to PADDLEENFORCE_EQ

* add ut for data transform

* refine ut for data_transform

* test=develop, fix ut failed on parallel se-resnext

* test=develop, change one more PADDLE_ENFORCE

* add test_tracer on multiple devices

* test=develop, change place to mutable for data transform

* test=develop, add transform data on same place test and remove useless log

* test=develop, Add to do for data layout and and ut for conv2d with no bias

cc311bdf

L
cpu Conv double grad (#19672) · b76343c3
由 lvmengsi 提交于 9月 17, 2019
```
* cpu conv_grad_grad
```
b76343c3
Z

disable memory optimization passes when FLAGS_use_ngraph=True, test=develop (#19778) · 754fd57e
由 Zeng Jinle 提交于 9月 17, 2019

754fd57e

翟

Implement FusedEmbeddingSeqPoolGradKernel with cblas_saxpy (#19770) · 93c85c93

由翟飞跃提交于 9月 17, 2019

* Implement the operator with sprase matrix multiply

* Update the URL of mklml library.

test=develop

* Disable MKLML implematation when using no-linux.

test=develop

* optimize bp with mkl sparse matrix
test=develop

* tmp add fused_emb_seq layer

* Add the support of padding_idx attribute.

test=develop

* add padding_idx support
test=develop

* implement grad refer lego
test=develop

93c85c93

16 9月, 2019 8 次提交

C
Fix warning info of build_strategy (#19805) · 82814970
由 chengduo 提交于 9月 16, 2019
```
* fix warning info
test=develop

* fix bug of all_reduce_deps_pass
test=develop
```
82814970
Z

fix retry allocator bug, test=develop (#19794) · b34933d9
由 Zeng Jinle 提交于 9月 16, 2019

b34933d9

Enhance fc_fuse_pass to enable fusing relu to fc_op (#19733) · c67c8758

由 Yiqun Liu 提交于 9月 16, 2019

* Refine the codes related to fc op.

* Add GPU implementation for fc functor.

* Apply fc_fuse_pass in GPU inference.
test=develop

* Change the cmake for fc op.

* Change PADDLE_ENFORCE to PADDLE_ENFORCE_EQ.

* Add an attribute to set the activation type in fc_op.

* Enhance the unittest of fc_op.
test=develop

* Remove the declaration of FCOpGrad back to the header file.
test=develop

* Set default value for newly added arguments in test_fc_op.
test=develop

* Enhance fc_fuse_pass to enable fusing relu.

* Allow print the shapes of var_desc in graph.
test=develop

* Enhance fc_fuse_pass_tester.

* Remove the use of PADDLE_ENFORCE.
test=develop

* Correct the number of ops after fusing.
test=develop

* Fix a typo.
test=develop

* Set activation_type to null when there is no relu in fc.
test=develop

* Refine fc_fuse_pass's codes.

* Enable the set of shape for tensor.

* Refine repeated_fc_relu_pass and add unittest.
test=develop

c67c8758

Z

reduce default value of cudnn workspace size, test=develop (#19780) · 32b1151f
由 Zeng Jinle 提交于 9月 16, 2019

32b1151f
Z
add kernel for squeeze_op, test=develop (#19656) · 52673956
由 zhongpu 提交于 9月 16, 2019
```
* add kernel for squeeze_op, test=develop

* delete comment, test=develop
```
52673956

add kernel for unstack_op, test=develop (#19538) · 2a81c367

由 zhongpu 提交于 9月 16, 2019

* add kernel for unstack_op, test=develop

* add kernel for unstack_op, test=develop

* add kernel for unstack_op, test=develop

* adjust the code format, test=develop

* modify some comment, test=develop

2a81c367

C

Add prune_backward function to cover complicated test_program.clone situation (#19772) · 00d5375e
由 Chen Weihang 提交于 9月 16, 2019

00d5375e
K

fix softmax axis!=-1. test=develop (#19800) · 99c78b77
由 Kaipeng Deng 提交于 9月 16, 2019

99c78b77

14 9月, 2019 2 次提交
- A
  Add common CreateKey for mkldnn handlers (#19767) · d4413a54
  由 Adam 提交于 9月 14, 2019
```
test=develop
```
  d4413a54
- Y
  Fix the definition issue when used mkl_scsrmm and mkl_dcsrmm functions. (#19774) · 0d6ea529
  由 Yihua Xu 提交于 9月 13, 2019
```
test=develop
```
  0d6ea529
13 9月, 2019 1 次提交

Open fuse all reduce option (#19765) · 056fdedd

由 chengduo 提交于 9月 13, 2019

* Open fuse all reduce op
test=develop

* Add Fuse optimization op log

* Add log in fuse_optimizer op pass and fuse all_reduce op pass

* replace with boost::optional<bool>
test=develop

* Polish code
test=develop

* fix code coverage
test=develop

056fdedd

12 9月, 2019 3 次提交
- A
  Remove constraint that last dimension is forced to be 1 by adding one_hot_v2 (#19716) · 8c7e4119
  由 Aurelius84 提交于 9月 12, 2019
```
* add one_hot_v2_op to remove last_dims==1 test=develop

* add api unittest code for CI_Coverage test=develop

* improve CI_Coverage rate by adding test_with_depth test=develop
```
  8c7e4119
- J
  
  modify activation op API, delete use_cudnn args, test=develop, (#19758) · e352467c
  由 JesseyXujin 提交于 9月 12, 2019
  
  e352467c
- J
  Refactoring activation mkldnn op (#19748) · 9e4c9585
  由 Jacek Czaja 提交于 9月 12, 2019
```
test=develop

- fix to BWD

test=develop
```
  9e4c9585
11 9月, 2019 10 次提交

Replace TemporaryAllocator by CUDADeviceContextAllocator (#18989) · 12542320

由 Huihuang Zheng 提交于 9月 11, 2019

TemporaryAllocator is a singleton used for allocating memory for Cudnn. Since it is a singleton, we can delete it for better performance in memory.

We replace TemporaryAllocator by CUDADeviceContextAllocator and CUDADeviceContextAllocation, which uses stream callback to delete the memory allocated for the stream to avoid singleton.

Also added data_feed_proto to operator to fix CI in CPU compilation

12542320

Make leaky relu inplacable (#19676) · 0daa5c97

由 Zeng Jinle 提交于 9月 11, 2019

* make leaky relu inplacable, test=develop

* force add unittests to pass coverage, test=develop

0daa5c97

Z

refine math_op_patch, test=develop (#19727) · 078a6782
由 Zeng Jinle 提交于 9月 11, 2019

078a6782
C
Open fuse broadcast option (#18833) · e506c99c
由 chengduo 提交于 9月 11, 2019
```
* fix vlog level and fuse option type
test=develop
```
e506c99c
J
- Softmax mkl-dnn refactoring (#19615) · 47f670d5
由 Jacek Czaja 提交于 9月 11, 2019
```
test=develop

- Cosmetic fixes

test=develop
```
47f670d5

Implement the GPU kernel of fc operator (#19687) · a65c728e

由 Yiqun Liu 提交于 9月 11, 2019

* Refine the codes related to fc op.

* Add GPU implementation for fc functor.

* Apply fc_fuse_pass in GPU inference.
test=develop

* Change the cmake for fc op.

* Change PADDLE_ENFORCE to PADDLE_ENFORCE_EQ.

* Add an attribute to set the activation type in fc_op.

* Enhance the unittest of fc_op.
test=develop

* Remove the declaration of FCOpGrad back to the header file.
test=develop

* Set default value for newly added arguments in test_fc_op.
test=develop

a65c728e

Remove constraint that last dimension is forced to be 1 in huber_loss op (#19562) · 22301115

由 Aurelius84 提交于 9月 11, 2019

* Remove constraint that last dimension is forced to be 1 in huber_loss
test=develop

* add y[rank-1] == 1 when x_rank=y_rank test=develop

* modify into contain_unknown_dim test=develop

22301115

C
Enable fused_all_reduce_op_handle support GPU and CPU Gradients (#19418) · 5866a7a5
由 chengduo 提交于 9月 11, 2019
```
* Enable fused_all_reduce_op_handle support GPU and CPU Gradients
```
5866a7a5

fix api-doc error for dygraph and backward (#19721) · 3e5fb636

由 Youwei Song 提交于 9月 11, 2019

* update dygraph api-doc and backward api-doc, test=develop

* update dygraph api-doc and backward api-doc, update api.spec, test=develop

* update dygraph api-doc and backward api-doc, update api.spec, test=develop

* update API.spec, test=develop

3e5fb636

T
paddle::framework::vectorize() templatization (#19730) · ec9bc1bd
由 Tao Luo 提交于 9月 11, 2019
```
remove unused accuracy-diff warpctc-cudnn implementation

test=develop
```
ec9bc1bd

10 9月, 2019 7 次提交

Z

add logs to left var memory size, test=develop (#19722) · bb4f8dee
由 Zeng Jinle 提交于 9月 10, 2019

bb4f8dee
A
MKLDNN handler cleanup (#19713) · 428b2b9e
由 Adam 提交于 9月 10, 2019
```
* MKLDNN handler cleanup

* MKLDNN handler cleanup
test=develop
```
428b2b9e

Add document annotations for FLAGS that need to be open to external developers... · 27235cf2

由 XiaoguangHu 提交于 9月 10, 2019

Add document annotations for FLAGS that need to be open to external developers test=develop (#19692)

Add document annotations for FLAGS that need to be open to external developers

27235cf2

Z

refine memory usage of some operators, test=develop (#19700) · 1c25c88a
由 Zeng Jinle 提交于 9月 10, 2019

1c25c88a

merge empty lod tensor, test=develop (#19228) · 25dcd74d

由 wangguanzhong 提交于 9月 10, 2019

* merge_empty_lod_tensor, test=develop

* fix multiclass_nms, test=develop

* refine API.spec, test=develop

* add unittest case for fetch, test=develop

* add lod tensor test, test=develop

* return index for multiclass_nms, test=develop

* add api for multiclass_nms2

* update API.spc, test=develop

* refine api doc, test=develop

* fix test_detection.py, test=develop

* polish code, test=develop

* add more unittest case, test=develop

25dcd74d

fix instag op (#19591) · c6756ed2

由 yaoxuefeng 提交于 9月 10, 2019

* fix instag op

* fix instag bug: Some tiny logical error, occurring when ins_tag (2nd input) is multiple. test=develop

c6756ed2

G
Fix float16 optimizer. (#19682) · 6c2bc29c
由 gongweibao 提交于 9月 10, 2019
```
Fix float16 optimizer
```
6c2bc29c

09 9月, 2019 4 次提交

Z

refine tensor.mutable_data, test=develop (#19680) · 713c05dd
由 Zeng Jinle 提交于 9月 09, 2019

713c05dd

Fix train error when test_program.clone is executed after optimizer.minimize (#19397) · c78a4781

由 Chen Weihang 提交于 9月 09, 2019

* add prune when test_program.clone is executed after optimizer.minimize

* add unittest, test=develop

* add resnet and transformer test case, test=develop

* add regularization for optimizer & program compare function, test=develop

* add lstm unittest, test=develop

* polish code based on review comment, test=develop

* adapt to interface change in framework._prune, test=develop

* update API.spec, test=develop

c78a4781

add kernel for unsqueeze_op and Add unsqueezed op test, test=develop (#19436) · 5f627488

由 zhongpu 提交于 9月 09, 2019

* add kernel for unsqueeze_op, test=develop

* add kernel for unsqueeze_op, test=develop

* add kernel for unsqueeze_op, test=develop

5f627488

Z

add gpu_allocator_try_time config, test=develop (#19675) · a7691603
由 Zeng Jinle 提交于 9月 09, 2019

a7691603

机器未来 / Paddle 与 Fork 源项目一致

机器未来 / Paddle
与 Fork 源项目一致