提交 · 8493f20ebc4e96313301066b9ed328829e882c6d · Crayon鑫 / Paddle

27 11月, 2019 1 次提交
- G
  Polish the codes of fc when needs padding (#21378) · 8493f20e
  由 GaoWei8 提交于 11月 27, 2019
```
test=develop
```
  8493f20e
26 11月, 2019 1 次提交

Add fc padding to improve mkl GEMM's performance when N and K are multiple of 128. (#20972) · 234060f8

由 GaoWei8 提交于 11月 26, 2019

* Add fc padding to solve mkl performance
test=develop

* fix gpu pass and error information
test=develop

* fix fc_fuse_pass_test
test=develop

* fix error information
test=develop

* fix error information
test=develop

* fix name and add fc op padding test
test=develop

* fix attributes
test=develop

* optimize fc padding
test=develop

* fix test
test=develop

234060f8

22 11月, 2019 1 次提交

add dequantize_abs_max op and modify lookup_table op (#20899) · f0b15184

由 Liufang Sang 提交于 11月 22, 2019

* add int8 kernel to lookup_table op and add dequantize op test=develop

* change paddle_enforce to paddle_enforce_eq test=develop

* change copyright and change some not suitable code test=develop

* remove debug log test=develop

* replace GetInputType with IndicateVarDataType test=develop

* fix EmptyGradMaker test=develop

* fix diff between cpu and gpu test=develop

* use memcopy when int8_t test=develop

f0b15184

14 11月, 2019 1 次提交
- W
  
  Fix warpctc in padding mode. (#21033) · cfdd1fc2
  由 whs 提交于 11月 14, 2019
  
  cfdd1fc2
12 11月, 2019 1 次提交

fix the computation for dx (grad for x) for prelu operation. (#20949) · e249d9a3

由 lilong12 提交于 11月 12, 2019

* set the default value of alpha for prelu to 0.25, test=develop

* add the call to __syncthreads(), test=develop

* fix the implementation of cpu prelu, test=develop

* repair the implementation of element mode prelu, test=develop

* modify test_prelu_op.py, test=develop

e249d9a3

08 11月, 2019 1 次提交

Add dependency for error_codes.proto (#21084) · 2f27b103

由 Chen Weihang 提交于 11月 08, 2019

* fix activation_functions deps, test=develop, test=document_fix

* add error_codes_proto deps, test=develop, test=document_fix

* try delete enforce.h, test=develop, test=document_fix

2f27b103

05 11月, 2019 2 次提交
- Z
  Fix ce ocr_recognition test fails (#20987) · 0059404e
  由 zhaoyuchen2018 提交于 11月 05, 2019
```
ocr_recognition fails, so add a path to handle small frame_size.

test=develop
```
  0059404e
- T
  refine murmurhash3_x64_128 for bloom_filter (#20996) · 25ffa844
  由 Tao Luo 提交于 11月 05, 2019
```
test=develop
```
  25ffa844
01 11月, 2019 1 次提交

Fix gru as small frame_size has error. (#20922) · 7f3a445e

由 zhaoyuchen2018 提交于 10月 31, 2019

seems shuffle_sync cannot handle small size

test=develop
Signed-off-by: Nzhaoyuchen <zhaoyuchen01@baidu.com>

7f3a445e

31 10月, 2019 2 次提交
- Z
  maxout supports channel_last input (#20846) · 8d1e9f0f
  由 Zhang Ting 提交于 10月 31, 2019
```
* maxout support channel_last input, test=develop

* modified details of Input(X) and Attr(groups, axis) in doc, test=develop
```
  8d1e9f0f
- Z
  
  fix the bug of conv_transpose:compatible with Anylayout setting, test=develop (#20897) · c18f1bd7
  由 Zhang Ting 提交于 10月 31, 2019
  
  c18f1bd7
30 10月, 2019 1 次提交
- Z
  
  fix select_rows mergeadd bug, test=develop (#20876) · d4289125
  由 zhang wenhui 提交于 10月 30, 2019
  
  d4289125
28 10月, 2019 1 次提交
- A
  
  add pyramid_hash_op (#20698) · aacd16db
  由 Aurelius84 提交于 10月 28, 2019
  
  aacd16db
23 10月, 2019 1 次提交

Bug Fix: Paddle-TRT cannot handle adaptive pooling in pool2d op converter and... · e89c16b9

由 Pei Yang 提交于 10月 23, 2019

Bug Fix: Paddle-TRT cannot handle adaptive pooling in pool2d op converter and "num" attribute in split op converter (#20733)

* fix pool2d trt converter, test=develop

* add fix for split op converter, test=develop

e89c16b9

16 10月, 2019 1 次提交
- Q
  Support fp16 in GPU impl of fused_elemwise_activation_op. (#20636) · 01eddc1a
  由 qingqing01 提交于 10月 16, 2019
```
* Support fp16 in fused_elemwise_activation_op.
* Fix unit testing in ONLY-CPU mode.
```
  01eddc1a
13 10月, 2019 1 次提交
- Z
  
  fix conv_transpose's bug: compatible with Anylayout setting, test=develop (#20589) · 78910480
  由 Zhang Ting 提交于 10月 13, 2019
  
  78910480
09 10月, 2019 1 次提交

mv two function in conv op for good code style (#20116) · ad60b3b8

由 liym27 提交于 10月 09, 2019

* Delete PadFuntion, include padding.h instead. test=develop

* move function(IsSymmetricPadding) from conv_cudnn_op.cu/conv_transpose_cudnn_op.cu to padding.h, test=develop

ad60b3b8

07 10月, 2019 1 次提交
- Z
  
  conv_transpose supports channel_last input, test=develop, test=document_preview (#20072) · cf6919bf
  由 Zhang Ting 提交于 10月 07, 2019
  
  cf6919bf
30 9月, 2019 1 次提交
- D
  Improve elementwise operators performance in same dimensions. (#19763) · 425279a5
  由 danleifeng 提交于 9月 30, 2019
```
Improve elementwise operators performance in same dimensions
```
  425279a5
29 9月, 2019 1 次提交

fix conv2d and conv3d: (#20042) · 3aa331d9

由 liym27 提交于 9月 29, 2019

1.support asymmetric padding;
    2.support padding algorithm:"SAME" and "VALID";
    3.support channel_last: data_format NHWC and NDHWC;
    4.change doc of python API and c++;

    test=develop, test=document_preview

3aa331d9

28 9月, 2019 1 次提交

fix pool2d pool3d,support asymmetric padding and channel_last (#19739) · 24010472

由 liym27 提交于 9月 28, 2019

* fix pool2d pool3d:
1. support asymmetric padding;
2. support padding algorithm:"SAME" and "VALID";
3. support channel_last: data_format NHWC and NDHWC;
4. support inferring shape when input with negative dims in compile time;
5. change doc of python API and c++;
6. fix bug in cuda kernel when Attr(adaptive) is true.

test=develop,test=document_preview

* fix 'tensors' to 'Tensors'. test=develop,test=document_preview

* add test for converage ValueError.test=develop,test=document_preview

* resolve conflict in test_pool2d. test=develop

24010472

27 9月, 2019 1 次提交
- C
  Add fp16 support for pad and split (#19881) · fb2a9cdf
  由 chengduo 提交于 9月 27, 2019
```
* make pad and split support fp16
test=develop
```
  fb2a9cdf
25 9月, 2019 1 次提交

add support of matmul with multiple head even different width and height (#19708) · c670058a

由 Bob Zhu 提交于 9月 25, 2019

* add support of matmul with multiple head even different width and height

Original matmul with multiple head supports only the mat_a.width == mat_b.height,
in that case, mat_b will be horizontally split. In this patch, we extend the
support when mat_a.width != mat_b.height but mat_a.width/head_number == mat_b.height,
in this case, mab_b will be vertically split.

One example is A is [3, 8], B is [2, 16], head_number is 4. In this
case, A will be split as [3, 2], B will be (vertically) split as
[2, 4]. The final result will be 4 matrix of 4 matrix of [3,4], i.e. [3, 16]

test=develop

* add support of matmul with multiple head even different width and height

Original matmul with multiple head supports only the mat_a.width == mat_b.height,
in that case, mat_b will be horizontally split. In this patch, we extend the
support when mat_a.width != mat_b.height but mat_a.width/head_number == mat_b.height,
in this case, mab_b will be vertically split.

One example is A is [3, 8], B is [2, 16], head_number is 4. In this
case, A will be split as [3, 2], B will be (vertically) split as
[2, 4]. The final result will be 4 matrix of 4 matrix of [3,4], i.e. [3, 16]

test=develop

* refactor the code of matmul with multiple head even different width and height

test=develop

c670058a

23 9月, 2019 1 次提交
- K
  fix softmax CE time limit check failed (#19846) · 3f021781
  由 Kaipeng Deng 提交于 9月 23, 2019
```
* fix softmax ce time limit check failed. test=develop

* refine softmax calc. test=develop
```
  3f021781
20 9月, 2019 1 次提交
- A
  support 2-level lod of input in sequence_pool (#19839) · fcf53e55
  由 Aurelius84 提交于 9月 20, 2019
```
* support 2-level lod of input in sequence_pool test=develop

* fix lod level bug in .cu test=develop
```
  fcf53e55
16 9月, 2019 1 次提交
- K
  
  fix softmax axis!=-1. test=develop (#19800) · 99c78b77
  由 Kaipeng Deng 提交于 9月 16, 2019
  
  99c78b77
11 9月, 2019 2 次提交

Replace TemporaryAllocator by CUDADeviceContextAllocator (#18989) · 12542320

由 Huihuang Zheng 提交于 9月 11, 2019

TemporaryAllocator is a singleton used for allocating memory for Cudnn. Since it is a singleton, we can delete it for better performance in memory.

We replace TemporaryAllocator by CUDADeviceContextAllocator and CUDADeviceContextAllocation, which uses stream callback to delete the memory allocated for the stream to avoid singleton.

Also added data_feed_proto to operator to fix CI in CPU compilation

12542320

Implement the GPU kernel of fc operator (#19687) · a65c728e

由 Yiqun Liu 提交于 9月 11, 2019

* Refine the codes related to fc op.

* Add GPU implementation for fc functor.

* Apply fc_fuse_pass in GPU inference.
test=develop

* Change the cmake for fc op.

* Change PADDLE_ENFORCE to PADDLE_ENFORCE_EQ.

* Add an attribute to set the activation type in fc_op.

* Enhance the unittest of fc_op.
test=develop

* Remove the declaration of FCOpGrad back to the header file.
test=develop

* Set default value for newly added arguments in test_fc_op.
test=develop

a65c728e

05 9月, 2019 3 次提交
- 1
  fix the diff between async mode and async_half mode (#19535) · 2f037c31
  由 123malin 提交于 9月 05, 2019
```
* test=develop,  communicator merge add => merge average
```
  2f037c31
- T
  unify PADDLE_ASSERT_MSG into PADDLE_ENFORCE(error_message) (#19631) · 3ae939e4
  由 Tao Luo 提交于 9月 05, 2019
```
* remove assert.h

* change PADDLE_ASSERT_MSG to PADDLE_ENFORCE

test=develop

* fix tensorrt paddle_enforce

test=develop
```
  3ae939e4
- T
  paddle::framework::vectorize() templatization (#19627) · d6c85c96
  由 Tao Luo 提交于 9月 05, 2019
```
test=develop
```
  d6c85c96
04 9月, 2019 1 次提交
- T
  refine some PADDLE_ENFORCE codes for unify PADDLE_ASSERT_MSG (#19607) · 0a46d345
  由 Tao Luo 提交于 9月 04, 2019
```
test=develop
```
  0a46d345
03 9月, 2019 2 次提交
- T
  refine PADDLE_ENFORCE codes for unify PADDLE_ASSERT_MSG (#19603) · 75d15719
  由 Tao Luo 提交于 9月 03, 2019
```
test=develop
```
  75d15719
- T
  replace PADDLE_ASSERT with PADDLE_ASSERT_MSG (#19586) · 49523ea1
  由 Tao Luo 提交于 9月 03, 2019
```
* remove unused PADDLE_ASSERT(_IS_NOT_ERROR)

* replace PADDLE_ASSERT with PADDLE_ASSERT_MSG

test=develop
```
  49523ea1
02 9月, 2019 1 次提交
- Z
  
  fix the compilation issue on windows caused by mkl_CSRMM (#19533) · 84c72801
  由 zhouwei25 提交于 9月 02, 2019
  
  84c72801
29 8月, 2019 1 次提交
- Z
  
  fix sofmax seg fault in AVX, test=develop (#19487) · 11f2f784
  由 Zeng Jinle 提交于 8月 29, 2019
  
  11f2f784
20 8月, 2019 1 次提交

Use sparse matrix to implement fused emb_seq_pool operator (#19064) · b9203958

由 Yihua Xu 提交于 8月 20, 2019

* Implement the operator with sprase matrix multiply

* Update the URL of mklml library.

test=develop

* Disable MKLML implematation when using no-linux.

test=develop

* Ignore the deprecated status for windows

test=develop

b9203958

19 8月, 2019 1 次提交
- S
  change PADDLE_ENFORCE to PADDLE_ENFORCE_CUDA_SUCCESS (#19205) · af0fbd90
  由 silingtong123 提交于 8月 19, 2019
```
* print error code if cuda related API fails
```
  af0fbd90
01 8月, 2019 1 次提交
- L
  Fix depthwise conv gpu kernel bug (#18582) · 22fa4c2d
  由 LielinJiang 提交于 8月 01, 2019
```
* fix depthwise conv gpu kernel bug, test=develop
* add more depthwise conv test, test=develop
```
  22fa4c2d
24 7月, 2019 1 次提交

Extend Matmul to support matrix multiplication with multiple heads (#18570) · 220eef60

由 Bob Zhu 提交于 7月 24, 2019

* extend matmul op to support multiple head multiplication

With the support of multiple head, the multiplication of two big matrixes is
split into multiplication of several (head_number) small matrixes. e.g. if
Mat A is [3, 24] and Mat B is [24, 4], when multiple A and B with head_number
as 4, Mat A will be split as 4 matrix of [3, 6] and Mat B will be 4 matrix of
[6, 4]. The result of final matrix will be 4 matrix of [3, 4], i.e. [3, 16].

220eef60

Crayon鑫 / Paddle 与 Fork 源项目一致

Crayon鑫 / Paddle
与 Fork 源项目一致