提交 · 12542320c52720c9812e2d558a953bcc397c8546 · Crayon鑫 / Paddle

11 9月, 2019 2 次提交

Replace TemporaryAllocator by CUDADeviceContextAllocator (#18989) · 12542320

由 Huihuang Zheng 提交于 9月 11, 2019

TemporaryAllocator is a singleton used for allocating memory for Cudnn. Since it is a singleton, we can delete it for better performance in memory.

We replace TemporaryAllocator by CUDADeviceContextAllocator and CUDADeviceContextAllocation, which uses stream callback to delete the memory allocated for the stream to avoid singleton.

Also added data_feed_proto to operator to fix CI in CPU compilation

12542320

Implement the GPU kernel of fc operator (#19687) · a65c728e

由 Yiqun Liu 提交于 9月 11, 2019

* Refine the codes related to fc op.

* Add GPU implementation for fc functor.

* Apply fc_fuse_pass in GPU inference.
test=develop

* Change the cmake for fc op.

* Change PADDLE_ENFORCE to PADDLE_ENFORCE_EQ.

* Add an attribute to set the activation type in fc_op.

* Enhance the unittest of fc_op.
test=develop

* Remove the declaration of FCOpGrad back to the header file.
test=develop

* Set default value for newly added arguments in test_fc_op.
test=develop

a65c728e

05 9月, 2019 3 次提交
- 1
  fix the diff between async mode and async_half mode (#19535) · 2f037c31
  由 123malin 提交于 9月 05, 2019
```
* test=develop,  communicator merge add => merge average
```
  2f037c31
- T
  unify PADDLE_ASSERT_MSG into PADDLE_ENFORCE(error_message) (#19631) · 3ae939e4
  由 Tao Luo 提交于 9月 05, 2019
```
* remove assert.h

* change PADDLE_ASSERT_MSG to PADDLE_ENFORCE

test=develop

* fix tensorrt paddle_enforce

test=develop
```
  3ae939e4
- T
  paddle::framework::vectorize() templatization (#19627) · d6c85c96
  由 Tao Luo 提交于 9月 05, 2019
```
test=develop
```
  d6c85c96
04 9月, 2019 1 次提交
- T
  refine some PADDLE_ENFORCE codes for unify PADDLE_ASSERT_MSG (#19607) · 0a46d345
  由 Tao Luo 提交于 9月 04, 2019
```
test=develop
```
  0a46d345
03 9月, 2019 2 次提交
- T
  refine PADDLE_ENFORCE codes for unify PADDLE_ASSERT_MSG (#19603) · 75d15719
  由 Tao Luo 提交于 9月 03, 2019
```
test=develop
```
  75d15719
- T
  replace PADDLE_ASSERT with PADDLE_ASSERT_MSG (#19586) · 49523ea1
  由 Tao Luo 提交于 9月 03, 2019
```
* remove unused PADDLE_ASSERT(_IS_NOT_ERROR)

* replace PADDLE_ASSERT with PADDLE_ASSERT_MSG

test=develop
```
  49523ea1
02 9月, 2019 1 次提交
- Z
  
  fix the compilation issue on windows caused by mkl_CSRMM (#19533) · 84c72801
  由 zhouwei25 提交于 9月 02, 2019
  
  84c72801
29 8月, 2019 1 次提交
- Z
  
  fix sofmax seg fault in AVX, test=develop (#19487) · 11f2f784
  由 Zeng Jinle 提交于 8月 29, 2019
  
  11f2f784
20 8月, 2019 1 次提交

Use sparse matrix to implement fused emb_seq_pool operator (#19064) · b9203958

由 Yihua Xu 提交于 8月 20, 2019

* Implement the operator with sprase matrix multiply

* Update the URL of mklml library.

test=develop

* Disable MKLML implematation when using no-linux.

test=develop

* Ignore the deprecated status for windows

test=develop

b9203958

19 8月, 2019 1 次提交
- S
  change PADDLE_ENFORCE to PADDLE_ENFORCE_CUDA_SUCCESS (#19205) · af0fbd90
  由 silingtong123 提交于 8月 19, 2019
```
* print error code if cuda related API fails
```
  af0fbd90
01 8月, 2019 1 次提交
- L
  Fix depthwise conv gpu kernel bug (#18582) · 22fa4c2d
  由 LielinJiang 提交于 8月 01, 2019
```
* fix depthwise conv gpu kernel bug, test=develop
* add more depthwise conv test, test=develop
```
  22fa4c2d
24 7月, 2019 1 次提交

Extend Matmul to support matrix multiplication with multiple heads (#18570) · 220eef60

由 Bob Zhu 提交于 7月 24, 2019

* extend matmul op to support multiple head multiplication

With the support of multiple head, the multiplication of two big matrixes is
split into multiplication of several (head_number) small matrixes. e.g. if
Mat A is [3, 24] and Mat B is [24, 4], when multiple A and B with head_number
as 4, Mat A will be split as 4 matrix of [3, 6] and Mat B will be 4 matrix of
[6, 4]. The result of final matrix will be 4 matrix of [3, 4], i.e. [3, 16].

220eef60

28 6月, 2019 1 次提交
- Z
  Add a unittest to inplace elementwise_add (#18385) · f5641000
  由 Zeng Jinle 提交于 6月 28, 2019
```
* add_elementwise_add_inplace_test,test=develop

* rename file, test=develop
```
  f5641000
25 6月, 2019 1 次提交

Sequence mask support tensor (#18249) · df2eee71

由 Hongyu Liu 提交于 6月 25, 2019

* sequnce mask support max length tensor input; test=develop

* add rnn_impl.py; test=develop

* add basic gru lstm unittest; test=develop

* fix api spec; test=develop

* fix sequence_mask op bug;
test=develop
test=document_preview

* change +-*x to elmentwise_op; test=develop

* add mkl flag; test=develop

* fix rnn impl bug; test=develop

* update api spec; test=develop

* fix doc bug; test=develop

* fix lstm bugs; test=develop

df2eee71

14 6月, 2019 1 次提交
- Y
  Optimize fused_elewise_activation_grad op. (#18041) · 660c1a65
  由 Yiqun Liu 提交于 6月 14, 2019
```
test=develop
```
  660c1a65
12 6月, 2019 1 次提交
- Y
  Optimize the concat and split cuda implementation for cases when the number of... · 7e463c84
  由 Yiqun Liu 提交于 6月 12, 2019
```
Optimize the concat and split cuda implementation for cases when the number of inputs/outputs is less than 5. (#17979)

test=develop
```
  7e463c84
10 6月, 2019 1 次提交

Enable seq_pool op to accept len 0 input (#17284) · 33d1e565

由 Yibing Liu 提交于 6月 10, 2019

* Enable seq_pool op to accept len 0 input

test=develop

* Update sequence_pool's api

test=develop

* Add more unittest cases for seq_pool op

test=develop

* Remove legacy comments

test=develop

* Don't use template in op maker

test=develop

33d1e565

30 5月, 2019 1 次提交

Enhance fused_elementwise_activation op and add python api in contrib.layers (#17236) · 8fd39f3e

由 Yiqun Liu 提交于 5月 30, 2019

* Enhance fused_elementwise_activation op.
test=develop

* Move the api fused_elementwise_activation to contrib.
test=develop

* Add including files.
test=develop

* Add the support of sigmoid in fused_elementwise_activetion op.

* Update API.spec.
test=develop

8fd39f3e

29 5月, 2019 1 次提交

Optimize the concat and split kernel for specical cases when the number of... · 5782ddda

由 Yiqun Liu 提交于 5月 29, 2019

Optimize the concat and split kernel for specical cases when the number of inputs/outputs is 2 (#17415)

* Optimize the concat and split kernel for special cases that the number of inputs/outputs is 2.
test=develop

* Refine codes.
test=develop

* Correct the condition.
test=develop

* Move the define of tmp_data outside the if statement.

* Print the cudnn minor version.
test=develop

* Fix the case when in_num/o_num is 1 in concat/split op.
test=develop

* Remove const_cast.
test=develop

5782ddda

24 5月, 2019 1 次提交

[CPU] refine cpu softmax bwd (#17534) · 7ae461eb

由 tensor-tang 提交于 5月 24, 2019

* refine softmax fwd

test=develop

* refine cpu softmax bwd

test=develop

* fix batch size

test=develop

* fix compile issue with gpu

test=develop

* add value clip

7ae461eb

23 5月, 2019 1 次提交

[CPU] refine softmax op fwd on CPU (#17522) · 0600b370

由 tensor-tang 提交于 5月 23, 2019

* refine softmax fwd

test=develop

* fix compile issue wih gpu

test=develop

* add value clip to avoid exp

0600b370

21 5月, 2019 1 次提交

fix security bugs : (#17464) · ba70cc49

由 liuwei1031 提交于 5月 21, 2019

http://newicafe.baidu.com:80/issue/PaddleSec-33/show?from=page
http://newicafe.baidu.com:80/issue/PaddleSec-28/show?from=page
http://newicafe.baidu.com:80/issue/PaddleSec-25/show?from=page
http://newicafe.baidu.com:80/issue/PaddleSec-24/show?from=page
http://newicafe.baidu.com:80/issue/PaddleSec-21/show?from=page
http://newicafe.baidu.com:80/issue/PaddleSec-20/show?from=page

test=develop

ba70cc49

16 5月, 2019 1 次提交

Add conditional compile for gru opt (#17368) · b02f2aff

由 zhaoyuchen2018 提交于 5月 16, 2019

* improve gru unit performance.
refine code

test=develop
Signed-off-by: Nzhaoyuchen <zhaoyuchen01@baidu.com>

* Add conditional compile for gru opt

Not enable gru opt if compute ability < 700

test=develop
Signed-off-by: Nzhaoyuchen <zhaoyuchen01@baidu.com>

* refine code.

test=develop
Signed-off-by: Nzhaoyuchen <zhaoyuchen01@baidu.com>

b02f2aff

15 5月, 2019 1 次提交
- K
  Optimize the sequence padding op (#17403) · 0823a7bc
  由 Krzysztof Binias 提交于 5月 15, 2019
```
test=develop
```
  0823a7bc
10 5月, 2019 1 次提交

improve gru unit performance. (#16338) · 8a2caacd

由 zhaoyuchen2018 提交于 5月 10, 2019

refine code

fuse cublas  calling and kernels into one cuda kernel.

test=develop
Signed-off-by: Nzhaoyuchen <zhaoyuchen01@baidu.com>

8a2caacd

07 5月, 2019 1 次提交

Softmax_cross_entropy op add axis (#16806) · a71d8fdb

由 Kaipeng Deng 提交于 5月 07, 2019

* add attr axis infershape. test=develop

* add CUDA kernel. test=develop

* fix unittest. test=develop

* fix unittest for soft_label. test=develop

* fix fp16 unittest. test=develop

* remove comment code. test=develop

* refine test for axis. test=develop

* add python api. test=develop

* fix doc. test=develop

* fix fp16 unittest. test=develop

* fix ngraph test. test=develop

* fix ENFORCE for test_imperative_transformer. test=develop

* fit for ngraph test. test=develop

* fix after rebase develop. test=develop

* fix doc. test=develop

* fix API.spec. test=develop

* fix test_layers. test=develop

* fix format. test=develop

a71d8fdb

20 4月, 2019 1 次提交

Support seq len equal to 0 in sequence ops (#16935) · 3c375751

由 Yibing Liu 提交于 4月 20, 2019

* Support seq len equal to 0 in sequence ops

test=develop

* Add more test cases

* Fix some comments

test=develop

* Fix py3 error

test=develop

3c375751

17 4月, 2019 1 次提交

fix overflow by int32 mul test=develop (#16794) · c474e7dd

由 Kevin 提交于 4月 17, 2019

* fix overflow by int32 mul test=develop

* fix reference nullptr

* fix codestyle test=develop

* modify to point in ContextProjectFunctor test=develop

* modify to point in ContextProjectFunctor test=develop

* modify . to -> test=develop

c474e7dd

12 4月, 2019 3 次提交
- Q
  
  fix cpplint test=develop · faae1b41
  由 Qiao Longfei 提交于 4月 12, 2019
  
  faae1b41
- Q
  
  add cpu_merge_add_multi_noduplicated_test test=develop · 0a8ff2ec
  由 Qiao Longfei 提交于 4月 12, 2019
  
  0a8ff2ec
- Q
  
  optimize merge add if input rows of all selected rows is not duplicated · 920a9609
  由 Qiao Longfei 提交于 4月 12, 2019
  
  920a9609
25 3月, 2019 1 次提交
- D
  
  fix format. test=develop · 90bd038d
  由 dengkaipeng 提交于 3月 25, 2019
  
  90bd038d
20 3月, 2019 2 次提交
- P
  
  fix sequence pad; test=develop · 1580be5d
  由 phlrain 提交于 3月 20, 2019
  
  1580be5d
- D
  
  add jit kernel for softmax axis. test=develop · 93701dba
  由 dengkaipeng 提交于 3月 20, 2019
  
  93701dba
18 3月, 2019 2 次提交
- D
  
  refine softmax kernel. test=develop · 6c641827
  由 dengkaipeng 提交于 3月 18, 2019
  
  6c641827
- P
  
  remove resize then seq num == 1; test=develop · 802b3348
  由 phlrain 提交于 3月 18, 2019
  
  802b3348
14 3月, 2019 2 次提交
- S
  revert revert 16144 · 5a92e4c0
  由 sneaxiy 提交于 3月 14, 2019
```
test=develop
```
  5a92e4c0
- Z
  Revert "PaddingRNN model memory optimize" · a91964c8
  由 Zeng Jinle 提交于 3月 14, 2019
```
test=develop
```
  a91964c8

Crayon鑫 / Paddle 与 Fork 源项目一致

Crayon鑫 / Paddle
与 Fork 源项目一致