提交 · 3df38f5cdd0866c1e78f1c2674d3d6cf3166d35f · 机器未来 / Paddle

10 1月, 2020 1 次提交

[cherry-pick] Add FC padding, ernie test unit and layernorm parallel (#22198) · 3df38f5c

由 GaoWei8 提交于 1月 10, 2020

* Optimize the kernel implementation of layernorm with openmp (#20895)

* Add ernie c++ inference test (#21015)

* Add ernie unit test
test=develop

* Add ernie unit test
test=develop

* Add ernie unit test
test=develop

* remove ngraph

* optimize gpu test
test=develop

* optimize codes
test=develop

* fix cmake fails on inference_download_and_uncompress (#21185)

* solve cmake fails on inference_download_and_uncompress
test=develop

* solve cmake fails on inference_download_and_uncompress
test=develop

* Add fc padding to improve mkl GEMM's performance when N and K are multiple of 128. (#20972)

* Add fc padding to solve mkl performance
test=develop

* fix gpu pass and error information
test=develop

* fix fc_fuse_pass_test
test=develop

* fix error information
test=develop

* fix error information
test=develop

* fix name and add fc op padding test
test=develop

* fix attributes
test=develop

* optimize fc padding
test=develop

* fix test
test=develop

* Polish the codes of fc when needs padding (#21378)

test=develop

* Add ernie large c++ inference test (#21365)

* add ernie-large test
test=develop

* add ernie large c++ inference test
test=develop

* Modify padding strategy: remove weight copy in fc padding (#21650)

test=develop

* optimize fc jit (#21878)

test=develop
Co-authored-by: NYihua Xu <yihuaxu@hotmail.com>

3df38f5c

06 12月, 2019 1 次提交
- A
  
  cherry-pick pyramid_hash op test=develop (#20779)(#18525) (#21562) · f83254d6
  由 Aurelius84 提交于 12月 06, 2019
  
  f83254d6
05 12月, 2019 1 次提交

[Cherry-pick] fix the computation for dx (grad for x) for prelu operation. (#20949) (#21514) · 40549473

由 lilong12 提交于 12月 05, 2019

* fix the computation for dx (grad for x) for prelu operation. (#20949)

* set the default value of alpha for prelu to 0.25, test=develop

* add the call to __syncthreads(), test=develop

* fix the implementation of cpu prelu, test=develop

* repair the implementation of element mode prelu, test=develop

* modify test_prelu_op.py, test=develop

40549473

02 12月, 2019 1 次提交
- Z
  [cherry-pick] Fix gru as small frame_size has error. (#20922) (#21440) · 873b32de
  由 zhaoyuchen2018 提交于 12月 02, 2019
```
seems shuffle_sync cannot handle small size

test=develop
Signed-off-by: Nzhaoyuchen <zhaoyuchen01@baidu.com>
```
  873b32de
25 11月, 2019 1 次提交

[cherry-pick] fix crop_tensor, maxout and lrn (#21302) · 3848f720

由 Zhang Ting 提交于 11月 25, 2019

* [cherry-pick] All elements in attr(shape) of crop_tensor can be -1 and int32/64 kernel registered (#20756)

* All elements in attr(shape) of crop_tensor can be -1, test=develop, test=document_preview

* fix the bug that attr(offsets) should be initialized, test=develop

* [cherry-pick] maxout supports channel_last input (#20846)

* maxout support channel_last input, test=develop

* modified details of Input(X) and Attr(groups, axis) in doc, test=develop

* [cherry-pick] lrn supports channel_last input, test=develop (#20954)

3848f720

21 11月, 2019 1 次提交

Cherry-pick error type support for release1.6 (#21294) · 974b8a83

由 Chen Weihang 提交于 11月 21, 2019

* delete paddle infershape enforce marco (#20832)

* Polish and arrange code in enforce.h (#20901)

* Enrich the type of error and declare the error type interfaces (#21024)

* Enrich the type of error and declare the error type interfaces, test=develop

* adjust tests to adapt new form, test=develop

* add inference deps with error_codes.pb.h, test=develop

* restore stack iter start pos, test=develop

* polish code based review comments, test=develop

* Add dependency for error_codes.proto (#21084)

* fix activation_functions deps, test=develop, test=document_fix

* add error_codes_proto deps, test=develop, test=document_fix

* try delete enforce.h, test=develop, test=document_fix

* change cuda enforce & add example (#21142)
test=release/1.6

974b8a83

31 10月, 2019 2 次提交

Z
[cherry-pick] fix the bug of conv_transpose: compitable with AnyLayout... · 416b3417
由 Zhang Ting 提交于 10月 31, 2019
```
[cherry-pick] fix the bug of conv_transpose: compitable with AnyLayout setting, test=release/1.6 #(20897) (#20918)
```
416b3417

Bug Fix: Paddle-TRT cannot handle adaptive pooling in pool2d op converter and... · 1948210c

由 Pei Yang 提交于 10月 31, 2019

Bug Fix: Paddle-TRT cannot handle adaptive pooling in pool2d op converter and "num" attribute in split op converter (#20733) (#20902)

* fix pool2d trt converter, test=develop

* add fix for split op converter, test=develop

1948210c

16 10月, 2019 1 次提交
- Q
  Support fp16 in GPU impl of fused_elemwise_activation_op. (#20636) (#20655) · 194f3dcf
  由 qingqing01 提交于 10月 16, 2019
```
* Support fp16 in fused_elemwise_activation_op.
* Fix unit testing in ONLY-CPU mode.
```
  194f3dcf
14 10月, 2019 1 次提交
- Z
  [cherry-pick] fix conv_transpose's bug: compatible with Anylayout setting · 63e9b700
  由 Zhang Ting 提交于 10月 14, 2019
```
fix conv_transpose's bug: compatible with Anylayout setting: cherry-pick 20589
```
  63e9b700
10 10月, 2019 1 次提交

[cherry-pick]mv two function in conv op for good code style (#20116) test=release/1.6 (#20268) · 0ebb3b5c

由 liym27 提交于 10月 10, 2019

* Delete PadFuntion, include padding.h instead.

* move function(IsSymmetricPadding) from conv_cudnn_op.cu/conv_transpose_cudnn_op.cu to padding.h.

0ebb3b5c

08 10月, 2019 1 次提交
- Z
  
  [cherry-pick] conv_transpose supports channel_last input, test=release/1.6 (#20072) (#20178) · 87a0a45d
  由 Zhang Ting 提交于 10月 08, 2019
  
  87a0a45d
03 10月, 2019 1 次提交

fix conv2d and conv3d: (#20042) (#20121) · 2faa38cd

由 liym27 提交于 10月 03, 2019

1.support asymmetric padding;
2.support padding algorithm:"SAME" and "VALID";
3.support channel_last: data_format NHWC and NDHWC;
4.change doc of python API and c++;

test=release/1.6

2faa38cd

01 10月, 2019 1 次提交
- D
  
  [cherry pick]Improve elementwise operators performance in same dimensions (#20134) · 43f11b5e
  由 danleifeng 提交于 10月 01, 2019
  
  43f11b5e
28 9月, 2019 1 次提交

fix pool2d pool3d,support asymmetric padding and channel_last (#19739) · 24010472

由 liym27 提交于 9月 28, 2019

* fix pool2d pool3d:
1. support asymmetric padding;
2. support padding algorithm:"SAME" and "VALID";
3. support channel_last: data_format NHWC and NDHWC;
4. support inferring shape when input with negative dims in compile time;
5. change doc of python API and c++;
6. fix bug in cuda kernel when Attr(adaptive) is true.

test=develop,test=document_preview

* fix 'tensors' to 'Tensors'. test=develop,test=document_preview

* add test for converage ValueError.test=develop,test=document_preview

* resolve conflict in test_pool2d. test=develop

24010472

27 9月, 2019 1 次提交
- C
  Add fp16 support for pad and split (#19881) · fb2a9cdf
  由 chengduo 提交于 9月 27, 2019
```
* make pad and split support fp16
test=develop
```
  fb2a9cdf
25 9月, 2019 1 次提交

add support of matmul with multiple head even different width and height (#19708) · c670058a

由 Bob Zhu 提交于 9月 25, 2019

* add support of matmul with multiple head even different width and height

Original matmul with multiple head supports only the mat_a.width == mat_b.height,
in that case, mat_b will be horizontally split. In this patch, we extend the
support when mat_a.width != mat_b.height but mat_a.width/head_number == mat_b.height,
in this case, mab_b will be vertically split.

One example is A is [3, 8], B is [2, 16], head_number is 4. In this
case, A will be split as [3, 2], B will be (vertically) split as
[2, 4]. The final result will be 4 matrix of 4 matrix of [3,4], i.e. [3, 16]

test=develop

* add support of matmul with multiple head even different width and height

Original matmul with multiple head supports only the mat_a.width == mat_b.height,
in that case, mat_b will be horizontally split. In this patch, we extend the
support when mat_a.width != mat_b.height but mat_a.width/head_number == mat_b.height,
in this case, mab_b will be vertically split.

One example is A is [3, 8], B is [2, 16], head_number is 4. In this
case, A will be split as [3, 2], B will be (vertically) split as
[2, 4]. The final result will be 4 matrix of 4 matrix of [3,4], i.e. [3, 16]

test=develop

* refactor the code of matmul with multiple head even different width and height

test=develop

c670058a

23 9月, 2019 1 次提交
- K
  fix softmax CE time limit check failed (#19846) · 3f021781
  由 Kaipeng Deng 提交于 9月 23, 2019
```
* fix softmax ce time limit check failed. test=develop

* refine softmax calc. test=develop
```
  3f021781
20 9月, 2019 1 次提交
- A
  support 2-level lod of input in sequence_pool (#19839) · fcf53e55
  由 Aurelius84 提交于 9月 20, 2019
```
* support 2-level lod of input in sequence_pool test=develop

* fix lod level bug in .cu test=develop
```
  fcf53e55
16 9月, 2019 1 次提交
- K
  
  fix softmax axis!=-1. test=develop (#19800) · 99c78b77
  由 Kaipeng Deng 提交于 9月 16, 2019
  
  99c78b77
11 9月, 2019 2 次提交

Replace TemporaryAllocator by CUDADeviceContextAllocator (#18989) · 12542320

由 Huihuang Zheng 提交于 9月 11, 2019

TemporaryAllocator is a singleton used for allocating memory for Cudnn. Since it is a singleton, we can delete it for better performance in memory.

We replace TemporaryAllocator by CUDADeviceContextAllocator and CUDADeviceContextAllocation, which uses stream callback to delete the memory allocated for the stream to avoid singleton.

Also added data_feed_proto to operator to fix CI in CPU compilation

12542320

Implement the GPU kernel of fc operator (#19687) · a65c728e

由 Yiqun Liu 提交于 9月 11, 2019

* Refine the codes related to fc op.

* Add GPU implementation for fc functor.

* Apply fc_fuse_pass in GPU inference.
test=develop

* Change the cmake for fc op.

* Change PADDLE_ENFORCE to PADDLE_ENFORCE_EQ.

* Add an attribute to set the activation type in fc_op.

* Enhance the unittest of fc_op.
test=develop

* Remove the declaration of FCOpGrad back to the header file.
test=develop

* Set default value for newly added arguments in test_fc_op.
test=develop

a65c728e

05 9月, 2019 3 次提交
- 1
  fix the diff between async mode and async_half mode (#19535) · 2f037c31
  由 123malin 提交于 9月 05, 2019
```
* test=develop,  communicator merge add => merge average
```
  2f037c31
- T
  unify PADDLE_ASSERT_MSG into PADDLE_ENFORCE(error_message) (#19631) · 3ae939e4
  由 Tao Luo 提交于 9月 05, 2019
```
* remove assert.h

* change PADDLE_ASSERT_MSG to PADDLE_ENFORCE

test=develop

* fix tensorrt paddle_enforce

test=develop
```
  3ae939e4
- T
  paddle::framework::vectorize() templatization (#19627) · d6c85c96
  由 Tao Luo 提交于 9月 05, 2019
```
test=develop
```
  d6c85c96
04 9月, 2019 1 次提交
- T
  refine some PADDLE_ENFORCE codes for unify PADDLE_ASSERT_MSG (#19607) · 0a46d345
  由 Tao Luo 提交于 9月 04, 2019
```
test=develop
```
  0a46d345
03 9月, 2019 2 次提交
- T
  refine PADDLE_ENFORCE codes for unify PADDLE_ASSERT_MSG (#19603) · 75d15719
  由 Tao Luo 提交于 9月 03, 2019
```
test=develop
```
  75d15719
- T
  replace PADDLE_ASSERT with PADDLE_ASSERT_MSG (#19586) · 49523ea1
  由 Tao Luo 提交于 9月 03, 2019
```
* remove unused PADDLE_ASSERT(_IS_NOT_ERROR)

* replace PADDLE_ASSERT with PADDLE_ASSERT_MSG

test=develop
```
  49523ea1
02 9月, 2019 1 次提交
- Z
  
  fix the compilation issue on windows caused by mkl_CSRMM (#19533) · 84c72801
  由 zhouwei25 提交于 9月 02, 2019
  
  84c72801
29 8月, 2019 1 次提交
- Z
  
  fix sofmax seg fault in AVX, test=develop (#19487) · 11f2f784
  由 Zeng Jinle 提交于 8月 29, 2019
  
  11f2f784
20 8月, 2019 1 次提交

Use sparse matrix to implement fused emb_seq_pool operator (#19064) · b9203958

由 Yihua Xu 提交于 8月 20, 2019

* Implement the operator with sprase matrix multiply

* Update the URL of mklml library.

test=develop

* Disable MKLML implematation when using no-linux.

test=develop

* Ignore the deprecated status for windows

test=develop

b9203958

19 8月, 2019 1 次提交
- S
  change PADDLE_ENFORCE to PADDLE_ENFORCE_CUDA_SUCCESS (#19205) · af0fbd90
  由 silingtong123 提交于 8月 19, 2019
```
* print error code if cuda related API fails
```
  af0fbd90
01 8月, 2019 1 次提交
- L
  Fix depthwise conv gpu kernel bug (#18582) · 22fa4c2d
  由 LielinJiang 提交于 8月 01, 2019
```
* fix depthwise conv gpu kernel bug, test=develop
* add more depthwise conv test, test=develop
```
  22fa4c2d
24 7月, 2019 1 次提交

Extend Matmul to support matrix multiplication with multiple heads (#18570) · 220eef60

由 Bob Zhu 提交于 7月 24, 2019

* extend matmul op to support multiple head multiplication

With the support of multiple head, the multiplication of two big matrixes is
split into multiplication of several (head_number) small matrixes. e.g. if
Mat A is [3, 24] and Mat B is [24, 4], when multiple A and B with head_number
as 4, Mat A will be split as 4 matrix of [3, 6] and Mat B will be 4 matrix of
[6, 4]. The result of final matrix will be 4 matrix of [3, 4], i.e. [3, 16].

220eef60

28 6月, 2019 1 次提交
- Z
  Add a unittest to inplace elementwise_add (#18385) · f5641000
  由 Zeng Jinle 提交于 6月 28, 2019
```
* add_elementwise_add_inplace_test,test=develop

* rename file, test=develop
```
  f5641000
25 6月, 2019 1 次提交

Sequence mask support tensor (#18249) · df2eee71

由 Hongyu Liu 提交于 6月 25, 2019

* sequnce mask support max length tensor input; test=develop

* add rnn_impl.py; test=develop

* add basic gru lstm unittest; test=develop

* fix api spec; test=develop

* fix sequence_mask op bug;
test=develop
test=document_preview

* change +-*x to elmentwise_op; test=develop

* add mkl flag; test=develop

* fix rnn impl bug; test=develop

* update api spec; test=develop

* fix doc bug; test=develop

* fix lstm bugs; test=develop

df2eee71

14 6月, 2019 1 次提交
- Y
  Optimize fused_elewise_activation_grad op. (#18041) · 660c1a65
  由 Yiqun Liu 提交于 6月 14, 2019
```
test=develop
```
  660c1a65
12 6月, 2019 1 次提交
- Y
  Optimize the concat and split cuda implementation for cases when the number of... · 7e463c84
  由 Yiqun Liu 提交于 6月 12, 2019
```
Optimize the concat and split cuda implementation for cases when the number of inputs/outputs is less than 5. (#17979)

test=develop
```
  7e463c84
10 6月, 2019 1 次提交

Enable seq_pool op to accept len 0 input (#17284) · 33d1e565

由 Yibing Liu 提交于 6月 10, 2019

* Enable seq_pool op to accept len 0 input

test=develop

* Update sequence_pool's api

test=develop

* Add more unittest cases for seq_pool op

test=develop

* Remove legacy comments

test=develop

* Don't use template in op maker

test=develop

33d1e565

30 5月, 2019 1 次提交

Enhance fused_elementwise_activation op and add python api in contrib.layers (#17236) · 8fd39f3e

由 Yiqun Liu 提交于 5月 30, 2019

* Enhance fused_elementwise_activation op.
test=develop

* Move the api fused_elementwise_activation to contrib.
test=develop

* Add including files.
test=develop

* Add the support of sigmoid in fused_elementwise_activetion op.

* Update API.spec.
test=develop

8fd39f3e

机器未来 / Paddle 与 Fork 源项目一致

机器未来 / Paddle
与 Fork 源项目一致