提交 · 8a2caacdbc938c35a4535b4e47f5626b504e2972 · s920243400 / PaddleDetection

10 5月, 2019 1 次提交

improve gru unit performance. (#16338) · 8a2caacd

由 zhaoyuchen2018 提交于 5月 10, 2019

refine code

fuse cublas  calling and kernels into one cuda kernel.

test=develop
Signed-off-by: Nzhaoyuchen <zhaoyuchen01@baidu.com>

8a2caacd

07 5月, 2019 1 次提交

Softmax_cross_entropy op add axis (#16806) · a71d8fdb

由 Kaipeng Deng 提交于 5月 07, 2019

* add attr axis infershape. test=develop

* add CUDA kernel. test=develop

* fix unittest. test=develop

* fix unittest for soft_label. test=develop

* fix fp16 unittest. test=develop

* remove comment code. test=develop

* refine test for axis. test=develop

* add python api. test=develop

* fix doc. test=develop

* fix fp16 unittest. test=develop

* fix ngraph test. test=develop

* fix ENFORCE for test_imperative_transformer. test=develop

* fit for ngraph test. test=develop

* fix after rebase develop. test=develop

* fix doc. test=develop

* fix API.spec. test=develop

* fix test_layers. test=develop

* fix format. test=develop

a71d8fdb

20 4月, 2019 1 次提交

Support seq len equal to 0 in sequence ops (#16935) · 3c375751

由 Yibing Liu 提交于 4月 20, 2019

* Support seq len equal to 0 in sequence ops

test=develop

* Add more test cases

* Fix some comments

test=develop

* Fix py3 error

test=develop

3c375751

17 4月, 2019 1 次提交

fix overflow by int32 mul test=develop (#16794) · c474e7dd

由 Kevin 提交于 4月 17, 2019

* fix overflow by int32 mul test=develop

* fix reference nullptr

* fix codestyle test=develop

* modify to point in ContextProjectFunctor test=develop

* modify to point in ContextProjectFunctor test=develop

* modify . to -> test=develop

c474e7dd

12 4月, 2019 3 次提交
- Q
  
  fix cpplint test=develop · faae1b41
  由 Qiao Longfei 提交于 4月 12, 2019
  
  faae1b41
- Q
  
  add cpu_merge_add_multi_noduplicated_test test=develop · 0a8ff2ec
  由 Qiao Longfei 提交于 4月 12, 2019
  
  0a8ff2ec
- Q
  
  optimize merge add if input rows of all selected rows is not duplicated · 920a9609
  由 Qiao Longfei 提交于 4月 12, 2019
  
  920a9609
25 3月, 2019 1 次提交
- D
  
  fix format. test=develop · 90bd038d
  由 dengkaipeng 提交于 3月 25, 2019
  
  90bd038d
20 3月, 2019 2 次提交
- P
  
  fix sequence pad; test=develop · 1580be5d
  由 phlrain 提交于 3月 20, 2019
  
  1580be5d
- D
  
  add jit kernel for softmax axis. test=develop · 93701dba
  由 dengkaipeng 提交于 3月 20, 2019
  
  93701dba
18 3月, 2019 2 次提交
- D
  
  refine softmax kernel. test=develop · 6c641827
  由 dengkaipeng 提交于 3月 18, 2019
  
  6c641827
- P
  
  remove resize then seq num == 1; test=develop · 802b3348
  由 phlrain 提交于 3月 18, 2019
  
  802b3348
14 3月, 2019 2 次提交
- S
  revert revert 16144 · 5a92e4c0
  由 sneaxiy 提交于 3月 14, 2019
```
test=develop
```
  5a92e4c0
- Z
  Revert "PaddingRNN model memory optimize" · a91964c8
  由 Zeng Jinle 提交于 3月 14, 2019
```
test=develop
```
  a91964c8
12 3月, 2019 1 次提交
- S
  refine code · b26e9bd2
  由 sneaxiy 提交于 3月 12, 2019
```
test=develop
```
  b26e9bd2
08 3月, 2019 3 次提交
- T
  simplify the jitkernel templates and tests · 14a764c9
  由 tensor-tang 提交于 3月 08, 2019
```
test=develop
```
  14a764c9
- Y
  Make parent_idx a dispensable output for beam_search op to support models... · 66ead07e
  由 Yiqun Liu 提交于 3月 08, 2019
```
Make parent_idx a dispensable output for beam_search op to support models saved by older paddle version. (#16106)

test=develop
```
  66ead07e
- Y
  Make parent_idx a dispensable output for beam_search op to support models... · 5bde1202
  由 Yiqun Liu 提交于 3月 08, 2019
```
Make parent_idx a dispensable output for beam_search op to support models saved by older paddle version. (#16106)

test=develop
```
  5bde1202
07 3月, 2019 1 次提交
- T
  unify the kernelfuncs cache and add unit test · 802f362a
  由 tensor-tang 提交于 3月 07, 2019
```
test=develop
```
  802f362a
04 3月, 2019 3 次提交
- Y
  Fix error in CUDA kernel of beam_search. (#15957) · c90b82a6
  由 Yiqun Liu 提交于 2月 28, 2019
```
test=develop
```
  c90b82a6
- Y
  Optimize gelu operation with mkl erf. · b48d56e8
  由 Yihua Xu 提交于 2月 26, 2019
```
test=develop
```
  b48d56e8
- Q
  
  improve communicator · 3691a46f
  由 Qiao Longfei 提交于 3月 04, 2019
  
  3691a46f
28 2月, 2019 1 次提交
- Y
  Fix error in CUDA kernel of beam_search. (#15957) · 87248281
  由 Yiqun Liu 提交于 2月 28, 2019
```
test=develop
```
  87248281
26 2月, 2019 1 次提交
- Y
  Optimize gelu operation with mkl erf. · 73967886
  由 Yihua Xu 提交于 2月 26, 2019
```
test=develop
```
  73967886
22 2月, 2019 2 次提交

T
Revert 15770 develop a6910f90 gelu mkl opt (#15872) · ee2321de
由 tensor-tang 提交于 2月 22, 2019
```
* Revert "Optimze Gelu with MKL Erf function (#15770)"

This reverts commit 676995c8.

* test=develop
```
ee2321de

Optimze Gelu with MKL Erf function (#15770) · 676995c8

由 Yihua Xu 提交于 2月 22, 2019

* Optimize for gelu operator

* Set up the low accuracy mode of MKL ERF function.

test=develop

* Only enable MKLML ERF when OS is linux

* Use the speical mklml version included vmsErf function to verify gelu mkl kernel.

test=develop

* Add the CUDA macro to avoid NVCC's compile issue.

test=develop

* Add the TODO comments for mklml library modification.

test=develop

* Clean Code

test=develop

* Add the comment of marco for NVCC compiler.

test=develop

676995c8

19 2月, 2019 1 次提交
- X
  update comment · f2262d73
  由 xuezhong 提交于 2月 19, 2019
```
test=develop
```
  f2262d73
11 2月, 2019 1 次提交
- X
  pass test for lstm op · fb9a6a2b
  由 xuezhong 提交于 2月 11, 2019
```
test=develop
```
  fb9a6a2b
02 2月, 2019 1 次提交
- P
  fix dependency · 061299be
  由 peizhilin 提交于 2月 02, 2019
```
test=develop
```
  061299be
30 1月, 2019 5 次提交
- X
  
  remove debug print · 4c98c2cc
  由 xuezhong 提交于 1月 30, 2019
  
  4c98c2cc
- X
  
  add sample_logits op · 58ad40cc
  由 xuezhong 提交于 1月 30, 2019
  
  58ad40cc
- X
  
  add cell clip and proj clip, fix bug for h0 · 88083632
  由 xuezhong 提交于 1月 30, 2019
  
  88083632
- T
  add analyzer_transformer_test · 3d0ecab4
  由 Tao Luo 提交于 1月 30, 2019
```
test=develop
```
  3d0ecab4
- Y
  Return parent_idx in beam_search op (#15520) · 16d54f7f
  由 Yiqun Liu 提交于 1月 30, 2019
```
* Refine beam_search_op to output an extra parent_idx tensor.
test=develop

* Fix the unittest test_beam_search_op.
test=develop

* Fix the merging mistake.
test=develop
```
  16d54f7f
29 1月, 2019 3 次提交
- T
  cache fc kernel · a18c0d42
  由 tensor-tang 提交于 1月 29, 2019
```
test=develop
```
  a18c0d42
- T
  cache softmax kernel func · 6e1ee7fb
  由 tensor-tang 提交于 1月 29, 2019
```
test=develop
```
  6e1ee7fb
- T
  refine softmax and use with cache · d59f7335
  由 tensor-tang 提交于 1月 28, 2019
```
test=develop
```
  d59f7335
24 1月, 2019 2 次提交

Add the CUDA kernel for beam_search op (#15020) · 3008fa12

由 Yiqun Liu 提交于 1月 24, 2019

* Refine the beam_search op and test.

* A basic CUDA implementation of beam_search for small batch_size.

* Implement CUDA kernel for beam_search_op.

* Use multiple CUDA threads in the same block to select the top beam.

* Update the python api of beam_search op.

* Enable extend function in CPU kernel of beam_search op.

* Unify the CUDA codes.
test=develop

* Unify the CPU kernel of beam_search op.

* Ensure the seletced items of beam_search_op's CPU kernel sorted by scores.

* Update the description of beam_search in API.spec.

* Enable the use of CUDA kernel in beam_search op.

* Exclude the beam_search's CUDA unittest when there is no CUDA gpu, and delete some debuging statements.
test=develop

* Follow comments.
test=develop

* Call the CPU kernel for beam_search op when batch_size > 4.
test=develop

* Remove the except of is_empty op in PrepareData.
test=develop

3008fa12

T
nce add check sample lables, test=develop (#15463) · 5cfc40de
由 tangwei12 提交于 1月 24, 2019
```
* nce add check sample lables, test=develop
```
5cfc40de

21 1月, 2019 1 次提交

Memory optimization of depthwise conv op and group norm op (#15313) · 9f8f0fc2

由 Dun 提交于 1月 21, 2019

* mem opt

* test=develop

* test=develop

* test=develop

* test=develop

* test=develop

* test=develop

* test=develop

* refine code  test=develop

* refine code  test=develop

* refine code  test=develop

* refine code  test=develop

* refine with cub test=develop

* fix mkldnn test && remove comments && test=develop

* polish code && test=develop

* add only_forward test && test=develop

9f8f0fc2

s920243400 / PaddleDetection 与 Fork 源项目一致

s920243400 / PaddleDetection
与 Fork 源项目一致