提交 · 676995c86cb4b49f9a41c7a32c5e054b16201753 · BaiXuePrincess / Paddle

22 2月, 2019 1 次提交

Optimze Gelu with MKL Erf function (#15770) · 676995c8

由 Yihua Xu 提交于 2月 22, 2019

* Optimize for gelu operator

* Set up the low accuracy mode of MKL ERF function.

test=develop

* Only enable MKLML ERF when OS is linux

* Use the speical mklml version included vmsErf function to verify gelu mkl kernel.

test=develop

* Add the CUDA macro to avoid NVCC's compile issue.

test=develop

* Add the TODO comments for mklml library modification.

test=develop

* Clean Code

test=develop

* Add the comment of marco for NVCC compiler.

test=develop

676995c8

02 2月, 2019 1 次提交
- P
  fix dependency · 061299be
  由 peizhilin 提交于 2月 02, 2019
```
test=develop
```
  061299be
30 1月, 2019 1 次提交

Return parent_idx in beam_search op (#15520) · 16d54f7f

由 Yiqun Liu 提交于 1月 30, 2019

* Refine beam_search_op to output an extra parent_idx tensor.
test=develop

* Fix the unittest test_beam_search_op.
test=develop

* Fix the merging mistake.
test=develop

16d54f7f

29 1月, 2019 3 次提交
- T
  cache fc kernel · a18c0d42
  由 tensor-tang 提交于 1月 29, 2019
```
test=develop
```
  a18c0d42
- T
  cache softmax kernel func · 6e1ee7fb
  由 tensor-tang 提交于 1月 29, 2019
```
test=develop
```
  6e1ee7fb
- T
  refine softmax and use with cache · d59f7335
  由 tensor-tang 提交于 1月 28, 2019
```
test=develop
```
  d59f7335
24 1月, 2019 2 次提交

Add the CUDA kernel for beam_search op (#15020) · 3008fa12

由 Yiqun Liu 提交于 1月 24, 2019

* Refine the beam_search op and test.

* A basic CUDA implementation of beam_search for small batch_size.

* Implement CUDA kernel for beam_search_op.

* Use multiple CUDA threads in the same block to select the top beam.

* Update the python api of beam_search op.

* Enable extend function in CPU kernel of beam_search op.

* Unify the CUDA codes.
test=develop

* Unify the CPU kernel of beam_search op.

* Ensure the seletced items of beam_search_op's CPU kernel sorted by scores.

* Update the description of beam_search in API.spec.

* Enable the use of CUDA kernel in beam_search op.

* Exclude the beam_search's CUDA unittest when there is no CUDA gpu, and delete some debuging statements.
test=develop

* Follow comments.
test=develop

* Call the CPU kernel for beam_search op when batch_size > 4.
test=develop

* Remove the except of is_empty op in PrepareData.
test=develop

3008fa12

T
nce add check sample lables, test=develop (#15463) · 5cfc40de
由 tangwei12 提交于 1月 24, 2019
```
* nce add check sample lables, test=develop
```
5cfc40de

21 1月, 2019 1 次提交

Memory optimization of depthwise conv op and group norm op (#15313) · 9f8f0fc2

由 Dun 提交于 1月 21, 2019

* mem opt

* test=develop

* test=develop

* test=develop

* test=develop

* test=develop

* test=develop

* test=develop

* refine code  test=develop

* refine code  test=develop

* refine code  test=develop

* refine code  test=develop

* refine with cub test=develop

* fix mkldnn test && remove comments && test=develop

* polish code && test=develop

* add only_forward test && test=develop

9f8f0fc2

18 1月, 2019 1 次提交

Tree conv op (#15217) · e2ba9668

由 zhaozhehao 提交于 1月 18, 2019

* refactor tree2col operator with new memory mechanism test=develop

* test=develop

* test=develop

* Modified API according to panyx0718 test=develop

* fix API change according to heavengate test=develop

* Modify API comment test=develop

e2ba9668

14 1月, 2019 1 次提交
- Q
  
  fix gru_gpu_kernel test=develop · 4d15515c
  由 Qiao Longfei 提交于 1月 14, 2019
  
  4d15515c
13 1月, 2019 3 次提交
- Q
  
  fix build problem test=develop · 4feae253
  由 Qiao Longfei 提交于 1月 13, 2019
  
  4feae253
- Q
  
  update avx gru grad kernel test=develop · 4c7be265
  由 Qiao Longfei 提交于 1月 13, 2019
  
  4c7be265
- Q
  update gru_grad_op · 9b16e540
  由 Qiao Longfei 提交于 1月 13, 2019
```
test=develop
```
  9b16e540
10 1月, 2019 1 次提交

[Feature] support mix precision training for resnet (#14899) · fd854183

由 Wu Yi 提交于 1月 10, 2019

* clip softmax for fp16

* updates

* fuse xent support fp16 test=develop

* wip

* wip

* add simple row reduce

* wip fp16 accurate softmax

* add accurate softmax kernel for fp16 test=develop

* update test=develop

* fix cpu build test=develop

* update api.spec test=develop

* follow comments test=develop

* fix build test=develop

* fix trt build test=develop

* fix inference build test=develop

* fix merge test=develop

* update test=develop

* try fix build test=develop

* fix build test=develop

* rename real_exp test=develop

* fortest

* remove hacky kernels test=develop

* clean up test=develop

fd854183

09 1月, 2019 1 次提交
- Q
  
  follow comment test=develop · c3b9edf9
  由 Qiao Longfei 提交于 1月 09, 2019
  
  c3b9edf9
08 1月, 2019 2 次提交
- S
  Revert "Revert "Remove op handle lock"" · ed409ac9
  由 sneaxiy 提交于 1月 08, 2019
```
test=develop
```
  ed409ac9
- Z
  Revert "Remove op handle lock" · dacfaaa9
  由 Zeng Jinle 提交于 1月 08, 2019
```
test=develop
```
  dacfaaa9
07 1月, 2019 2 次提交
- T
  
  use height from params of jitcode · 0145f40f
  由 tensor-tang 提交于 1月 05, 2019
  
  0145f40f
- Q
  
  update gru op forward kernel · 3e1b914f
  由 Qiao Longfei 提交于 1月 07, 2019
  
  3e1b914f
04 1月, 2019 3 次提交
- T
  
  add jitcode impl and use it · c50060bb
  由 tensor-tang 提交于 12月 29, 2018
  
  c50060bb
- T
  
  add seqpool jitkernel test and benchmark · 142bb417
  由 tensor-tang 提交于 12月 29, 2018
  
  142bb417
- T
  
  use seqpool jitkernel · e58a569c
  由 tensor-tang 提交于 12月 28, 2018
  
  e58a569c
02 1月, 2019 1 次提交
- S
  remove_op_handle_lock · d0a8a1e9
  由 sneaxiy 提交于 1月 02, 2019
```
test=develop
```
  d0a8a1e9
29 12月, 2018 1 次提交
- S
  remove tensor core lock · d25395fc
  由 sneaxiy 提交于 12月 29, 2018
```
test=develop
```
  d25395fc
28 12月, 2018 1 次提交
- Q
  
  sum op support empty selected rows as input · 25d44d40
  由 Qiao Longfei 提交于 12月 28, 2018
  
  25d44d40
25 12月, 2018 1 次提交

Move GetTensor to tensor_util (#15011) · b9fb03cf

由 chengduo 提交于 12月 25, 2018

* refine tensor
test=develop

* refine tensor
test=develop

* fix device_context log
test=develop

b9fb03cf

21 12月, 2018 2 次提交

[Feature] Add Temporary Allocator (#14875) · 79bd6dfa

由 chengduo 提交于 12月 21, 2018

* Add Temporal Allocator

* add Temporay Allocator to DeviceContext
test=develop

* code refine
test=develop

* fix mean_iou
test=develop

* Add DeviceTemporaryAllocator
test=develop

* fix conv_op bug
test=develop

* small fix
test=develop

* code refine
test=develop

* log refine
test=develop

* fix unit test
test=develop

* move double check

* refine concat_and_split
test=develop

* add limit_of_temporary_allocation
test=develop

* fix name
test=develop

79bd6dfa

M
Remove unnessesary code · 0a4b6fc0
由 minqiyang 提交于 12月 21, 2018
```
test=develop
```
0a4b6fc0

20 12月, 2018 2 次提交
- T
  fix enum style · 1aaec571
  由 tensor-tang 提交于 12月 20, 2018
```
test=develop
```
  1aaec571
- M
  
  Accelerate lstm · 454db666
  由 minqiyang 提交于 12月 20, 2018
  
  454db666
19 12月, 2018 3 次提交
- T
  clean code and remove unused files · d53c4756
  由 tensor-tang 提交于 12月 19, 2018
```
test=develop
```
  d53c4756
- P
  fix the build issue · 0b4f742e
  由 peizhilin 提交于 12月 19, 2018
```
test=develop
```
  0b4f742e
- P
  disable xbyak on windows · 1cc9d598
  由 peizhilin 提交于 12月 19, 2018
```
test=develop
```
  1cc9d598
18 12月, 2018 6 次提交
- T
  
  fix build · 6648995f
  由 tensor-tang 提交于 12月 17, 2018
  
  6648995f
- S
  rewrite ddim · a500dfa5
  由 sneaxiy 提交于 12月 18, 2018
```
test=develop
```
  a500dfa5
- J
  
  fix bug after merge reyoung optimization, test=develop · b5fa9164
  由 JiabinYang 提交于 12月 18, 2018
  
  b5fa9164
- P
  Fix the mkl build script on windows · fa135bbf
  由 peizhilin 提交于 12月 18, 2018
```
test=develop
```
  fa135bbf
- P
  include the mkl fix only · b601f2de
  由 peizhilin 提交于 12月 18, 2018
```
test=develop
```
  b601f2de
- P
  
  add mkl,ctc support for windows · 5a6d7fe2
  由 peizhilin 提交于 12月 18, 2018
  
  5a6d7fe2

BaiXuePrincess / Paddle 与 Fork 源项目一致

BaiXuePrincess / Paddle
与 Fork 源项目一致