提交 · 2c5c7b2a7e08a3f19322e8e748544c4874468773 · BaiXuePrincess / Paddle

22 2月, 2019 2 次提交

T
Revert 15770 develop a6910f90 gelu mkl opt (#15872) · ee2321de
由 tensor-tang 提交于 2月 22, 2019
```
* Revert "Optimze Gelu with MKL Erf function (#15770)"

This reverts commit 676995c8.

* test=develop
```
ee2321de

Optimze Gelu with MKL Erf function (#15770) · 676995c8

由 Yihua Xu 提交于 2月 22, 2019

* Optimize for gelu operator

* Set up the low accuracy mode of MKL ERF function.

test=develop

* Only enable MKLML ERF when OS is linux

* Use the speical mklml version included vmsErf function to verify gelu mkl kernel.

test=develop

* Add the CUDA macro to avoid NVCC's compile issue.

test=develop

* Add the TODO comments for mklml library modification.

test=develop

* Clean Code

test=develop

* Add the comment of marco for NVCC compiler.

test=develop

676995c8

19 2月, 2019 1 次提交
- X
  update comment · f2262d73
  由 xuezhong 提交于 2月 19, 2019
```
test=develop
```
  f2262d73
11 2月, 2019 1 次提交
- X
  pass test for lstm op · fb9a6a2b
  由 xuezhong 提交于 2月 11, 2019
```
test=develop
```
  fb9a6a2b
02 2月, 2019 1 次提交
- P
  fix dependency · 061299be
  由 peizhilin 提交于 2月 02, 2019
```
test=develop
```
  061299be
30 1月, 2019 4 次提交
- X
  
  remove debug print · 4c98c2cc
  由 xuezhong 提交于 1月 30, 2019
  
  4c98c2cc
- X
  
  add sample_logits op · 58ad40cc
  由 xuezhong 提交于 1月 30, 2019
  
  58ad40cc
- X
  
  add cell clip and proj clip, fix bug for h0 · 88083632
  由 xuezhong 提交于 1月 30, 2019
  
  88083632
- Y
  Return parent_idx in beam_search op (#15520) · 16d54f7f
  由 Yiqun Liu 提交于 1月 30, 2019
```
* Refine beam_search_op to output an extra parent_idx tensor.
test=develop

* Fix the unittest test_beam_search_op.
test=develop

* Fix the merging mistake.
test=develop
```
  16d54f7f
29 1月, 2019 3 次提交
- T
  cache fc kernel · a18c0d42
  由 tensor-tang 提交于 1月 29, 2019
```
test=develop
```
  a18c0d42
- T
  cache softmax kernel func · 6e1ee7fb
  由 tensor-tang 提交于 1月 29, 2019
```
test=develop
```
  6e1ee7fb
- T
  refine softmax and use with cache · d59f7335
  由 tensor-tang 提交于 1月 28, 2019
```
test=develop
```
  d59f7335
24 1月, 2019 2 次提交

Add the CUDA kernel for beam_search op (#15020) · 3008fa12

由 Yiqun Liu 提交于 1月 24, 2019

* Refine the beam_search op and test.

* A basic CUDA implementation of beam_search for small batch_size.

* Implement CUDA kernel for beam_search_op.

* Use multiple CUDA threads in the same block to select the top beam.

* Update the python api of beam_search op.

* Enable extend function in CPU kernel of beam_search op.

* Unify the CUDA codes.
test=develop

* Unify the CPU kernel of beam_search op.

* Ensure the seletced items of beam_search_op's CPU kernel sorted by scores.

* Update the description of beam_search in API.spec.

* Enable the use of CUDA kernel in beam_search op.

* Exclude the beam_search's CUDA unittest when there is no CUDA gpu, and delete some debuging statements.
test=develop

* Follow comments.
test=develop

* Call the CPU kernel for beam_search op when batch_size > 4.
test=develop

* Remove the except of is_empty op in PrepareData.
test=develop

3008fa12

T
nce add check sample lables, test=develop (#15463) · 5cfc40de
由 tangwei12 提交于 1月 24, 2019
```
* nce add check sample lables, test=develop
```
5cfc40de

21 1月, 2019 1 次提交

Memory optimization of depthwise conv op and group norm op (#15313) · 9f8f0fc2

由 Dun 提交于 1月 21, 2019

* mem opt

* test=develop

* test=develop

* test=develop

* test=develop

* test=develop

* test=develop

* test=develop

* refine code  test=develop

* refine code  test=develop

* refine code  test=develop

* refine code  test=develop

* refine with cub test=develop

* fix mkldnn test && remove comments && test=develop

* polish code && test=develop

* add only_forward test && test=develop

9f8f0fc2

18 1月, 2019 1 次提交

Tree conv op (#15217) · e2ba9668

由 zhaozhehao 提交于 1月 18, 2019

* refactor tree2col operator with new memory mechanism test=develop

* test=develop

* test=develop

* Modified API according to panyx0718 test=develop

* fix API change according to heavengate test=develop

* Modify API comment test=develop

e2ba9668

14 1月, 2019 1 次提交
- Q
  
  fix gru_gpu_kernel test=develop · 4d15515c
  由 Qiao Longfei 提交于 1月 14, 2019
  
  4d15515c
13 1月, 2019 3 次提交
- Q
  
  fix build problem test=develop · 4feae253
  由 Qiao Longfei 提交于 1月 13, 2019
  
  4feae253
- Q
  
  update avx gru grad kernel test=develop · 4c7be265
  由 Qiao Longfei 提交于 1月 13, 2019
  
  4c7be265
- Q
  update gru_grad_op · 9b16e540
  由 Qiao Longfei 提交于 1月 13, 2019
```
test=develop
```
  9b16e540
10 1月, 2019 1 次提交

[Feature] support mix precision training for resnet (#14899) · fd854183

由 Wu Yi 提交于 1月 10, 2019

* clip softmax for fp16

* updates

* fuse xent support fp16 test=develop

* wip

* wip

* add simple row reduce

* wip fp16 accurate softmax

* add accurate softmax kernel for fp16 test=develop

* update test=develop

* fix cpu build test=develop

* update api.spec test=develop

* follow comments test=develop

* fix build test=develop

* fix trt build test=develop

* fix inference build test=develop

* fix merge test=develop

* update test=develop

* try fix build test=develop

* fix build test=develop

* rename real_exp test=develop

* fortest

* remove hacky kernels test=develop

* clean up test=develop

fd854183

09 1月, 2019 1 次提交
- Q
  
  follow comment test=develop · c3b9edf9
  由 Qiao Longfei 提交于 1月 09, 2019
  
  c3b9edf9
08 1月, 2019 2 次提交
- S
  Revert "Revert "Remove op handle lock"" · ed409ac9
  由 sneaxiy 提交于 1月 08, 2019
```
test=develop
```
  ed409ac9
- Z
  Revert "Remove op handle lock" · dacfaaa9
  由 Zeng Jinle 提交于 1月 08, 2019
```
test=develop
```
  dacfaaa9
07 1月, 2019 2 次提交
- T
  
  use height from params of jitcode · 0145f40f
  由 tensor-tang 提交于 1月 05, 2019
  
  0145f40f
- Q
  
  update gru op forward kernel · 3e1b914f
  由 Qiao Longfei 提交于 1月 07, 2019
  
  3e1b914f
04 1月, 2019 3 次提交
- T
  
  add jitcode impl and use it · c50060bb
  由 tensor-tang 提交于 12月 29, 2018
  
  c50060bb
- T
  
  add seqpool jitkernel test and benchmark · 142bb417
  由 tensor-tang 提交于 12月 29, 2018
  
  142bb417
- T
  
  use seqpool jitkernel · e58a569c
  由 tensor-tang 提交于 12月 28, 2018
  
  e58a569c
02 1月, 2019 1 次提交
- S
  remove_op_handle_lock · d0a8a1e9
  由 sneaxiy 提交于 1月 02, 2019
```
test=develop
```
  d0a8a1e9
29 12月, 2018 1 次提交
- S
  remove tensor core lock · d25395fc
  由 sneaxiy 提交于 12月 29, 2018
```
test=develop
```
  d25395fc
28 12月, 2018 1 次提交
- Q
  
  sum op support empty selected rows as input · 25d44d40
  由 Qiao Longfei 提交于 12月 28, 2018
  
  25d44d40
25 12月, 2018 1 次提交

Move GetTensor to tensor_util (#15011) · b9fb03cf

由 chengduo 提交于 12月 25, 2018

* refine tensor
test=develop

* refine tensor
test=develop

* fix device_context log
test=develop

b9fb03cf

21 12月, 2018 2 次提交

[Feature] Add Temporary Allocator (#14875) · 79bd6dfa

由 chengduo 提交于 12月 21, 2018

* Add Temporal Allocator

* add Temporay Allocator to DeviceContext
test=develop

* code refine
test=develop

* fix mean_iou
test=develop

* Add DeviceTemporaryAllocator
test=develop

* fix conv_op bug
test=develop

* small fix
test=develop

* code refine
test=develop

* log refine
test=develop

* fix unit test
test=develop

* move double check

* refine concat_and_split
test=develop

* add limit_of_temporary_allocation
test=develop

* fix name
test=develop

79bd6dfa

M
Remove unnessesary code · 0a4b6fc0
由 minqiyang 提交于 12月 21, 2018
```
test=develop
```
0a4b6fc0

20 12月, 2018 2 次提交
- T
  fix enum style · 1aaec571
  由 tensor-tang 提交于 12月 20, 2018
```
test=develop
```
  1aaec571
- M
  
  Accelerate lstm · 454db666
  由 minqiyang 提交于 12月 20, 2018
  
  454db666
19 12月, 2018 3 次提交
- T
  clean code and remove unused files · d53c4756
  由 tensor-tang 提交于 12月 19, 2018
```
test=develop
```
  d53c4756
- P
  fix the build issue · 0b4f742e
  由 peizhilin 提交于 12月 19, 2018
```
test=develop
```
  0b4f742e
- P
  disable xbyak on windows · 1cc9d598
  由 peizhilin 提交于 12月 19, 2018
```
test=develop
```
  1cc9d598

BaiXuePrincess / Paddle 与 Fork 源项目一致

BaiXuePrincess / Paddle
与 Fork 源项目一致