提交 · 3008fa1261ead553549aee8da576147701b7ef08 · PaddlePaddle / PaddleDetection

24 1月, 2019 1 次提交

Add the CUDA kernel for beam_search op (#15020) · 3008fa12

由 Yiqun Liu 提交于 1月 24, 2019

* Refine the beam_search op and test.

* A basic CUDA implementation of beam_search for small batch_size.

* Implement CUDA kernel for beam_search_op.

* Use multiple CUDA threads in the same block to select the top beam.

* Update the python api of beam_search op.

* Enable extend function in CPU kernel of beam_search op.

* Unify the CUDA codes.
test=develop

* Unify the CPU kernel of beam_search op.

* Ensure the seletced items of beam_search_op's CPU kernel sorted by scores.

* Update the description of beam_search in API.spec.

* Enable the use of CUDA kernel in beam_search op.

* Exclude the beam_search's CUDA unittest when there is no CUDA gpu, and delete some debuging statements.
test=develop

* Follow comments.
test=develop

* Call the CPU kernel for beam_search op when batch_size > 4.
test=develop

* Remove the except of is_empty op in PrepareData.
test=develop

3008fa12

17 8月, 2018 1 次提交
- D
  "fix float16 ShuffleDownSync Bug" (#12756) · 2673798d
  由 dzhwinter 提交于 8月 17, 2018
```
* "fix bug"

* "add test case"
```
  2673798d
30 7月, 2018 1 次提交
- D
  float16 type support enhance (#12181) · 39ac9e39
  由 dzhwinter 提交于 7月 30, 2018
```
* cherry picked

* "cherry picked platform"

* "add comment"

* "fix ci"
```
  39ac9e39
08 5月, 2018 1 次提交
- C
  
  add sync · 345737d0
  由 chengduoZH 提交于 5月 08, 2018
  
  345737d0
04 5月, 2018 1 次提交
- C
  
  wrap_shfl_x_sync · d36af62c
  由 chengduoZH 提交于 5月 03, 2018
  
  d36af62c
03 5月, 2018 2 次提交
- C
  
  fix __shfl · e97c1a8c
  由 chengduoZH 提交于 5月 03, 2018
  
  e97c1a8c
- C
  Fix __shfl_down_sync_ of cross_entropy (#10345) · 4fbde42c
  由 chengduo 提交于 5月 03, 2018
```
* fix __shfl_down_sync_ of cross_entropy

* use reduceSum

* "fix ci"
```
  4fbde42c

PaddlePaddle / PaddleDetection 大约 1 年 前同步成功

PaddlePaddle / PaddleDetection
大约 1 年前同步成功