提交 · c49e604906551e1802c1da213450631b660d9daa · 机器未来 / Paddle

24 1月, 2019 2 次提交

Add the CUDA kernel for beam_search op (#15020) · 3008fa12

由 Yiqun Liu 提交于 1月 24, 2019

* Refine the beam_search op and test.

* A basic CUDA implementation of beam_search for small batch_size.

* Implement CUDA kernel for beam_search_op.

* Use multiple CUDA threads in the same block to select the top beam.

* Update the python api of beam_search op.

* Enable extend function in CPU kernel of beam_search op.

* Unify the CUDA codes.
test=develop

* Unify the CPU kernel of beam_search op.

* Ensure the seletced items of beam_search_op's CPU kernel sorted by scores.

* Update the description of beam_search in API.spec.

* Enable the use of CUDA kernel in beam_search op.

* Exclude the beam_search's CUDA unittest when there is no CUDA gpu, and delete some debuging statements.
test=develop

* Follow comments.
test=develop

* Call the CPU kernel for beam_search op when batch_size > 4.
test=develop

* Remove the except of is_empty op in PrepareData.
test=develop

3008fa12

S
lazy_allocator · 51227bd4
由 sneaxiy 提交于 1月 23, 2019
```
test=develop
```
51227bd4

04 12月, 2018 1 次提交

[Feature] multi process multi gpu dist training, boost v100 performance by 20% (#14661) · 29d9fb53

由 Wu Yi 提交于 12月 04, 2018

* wip multi process multi gpu dist training

* workable for p2p

* update test=develop

* change back env name test=develop

* fix alloc init

* fix cpu build test=devlop

* fix mac tests test=develop

* refine code

* refine test=develop

29d9fb53

27 11月, 2018 1 次提交
- P
  
  minor fix · 38715e6f
  由 peizhilin 提交于 11月 27, 2018
  
  38715e6f
26 11月, 2018 2 次提交
- M
  Revert the changes of VLOG · 53433d7f
  由 minqiyang 提交于 11月 26, 2018
```
test=develop
```
  53433d7f
- P
  
  Given the different fraction_of_gpu_memory_to_use depends on platform · b2f8d418
  由 peizhilin 提交于 11月 26, 2018
  
  b2f8d418
22 11月, 2018 2 次提交

Refine cublas to support CUBLAS_TENSOR_OP_MATH (#13929) · 00b9e9a1

由 chengduo 提交于 11月 22, 2018

* refine cublase
test=develop

* code refine

* refine cublas

* add GEMME_EX

* add enable_cublas_tensor_op_math doc and add cublasCall
test=develop

* fix CublasCall for cuda version
test=develop

* fix error
test=develop

* fix GEMM_EX to be compatible with gcc 4.8
test=develop

* add GEMM_EX
test=develop

* to compatiable with gcc4.8
test=develop

00b9e9a1

P

fix unit test cases · 7c8c9dc9
由 peizhilin 提交于 11月 22, 2018

7c8c9dc9

08 11月, 2018 1 次提交
- M
  Change the origin VLOG level to 10 times · 0c3227a5
  由 minqiyang 提交于 11月 08, 2018
```
Fix code to support cpplint syntax check

test=develop
```
  0c3227a5
15 10月, 2018 1 次提交
- C
  add cuda version display (#13885) · 2c9839c8
  由 chengduo 提交于 10月 15, 2018
```
test=develop
```
  2c9839c8
08 10月, 2018 1 次提交
- X
  clarify the fraction_of_gpu_memory flag · ab798a28
  由 Xin Pan 提交于 10月 08, 2018
```
test=develop
```
  ab798a28
27 9月, 2018 1 次提交
- T
  Revert "Some trivial optimization (#13530)" · a4f7696a
  由 typhoonzero 提交于 9月 27, 2018
```
This reverts commit 1d91a49d.
```
  a4f7696a
26 9月, 2018 1 次提交

Some trivial optimization (#13530) · 1d91a49d

由 chengduo 提交于 9月 26, 2018

* some trivial opt

* remove the fix of lod_tensor and shrink_rnn_memory_op

* refine ShrinkRNNMemoryOp

test=develop

1d91a49d

14 8月, 2018 1 次提交
- C
  
  refine by reviewer's advice · da39d84a
  由 chenweihang 提交于 8月 14, 2018
  
  da39d84a
08 8月, 2018 1 次提交
- C
  
  polish high frequency enforce error message · 61052cdb
  由 chenweihang 提交于 8月 08, 2018
  
  61052cdb
23 4月, 2018 1 次提交
- F
  
  Add synchronous TensorCopy and use it in double buffer · 9f11da59
  由 fengjiayi 提交于 4月 23, 2018
  
  9f11da59
08 4月, 2018 1 次提交
- Y
  Fix cpplint errors with paddle/fluid/platform/gpu_info.* (#9710) · 0c43a376
  由 Yi Wang 提交于 4月 07, 2018
```
* Fix cpplint errors with paddle/fluid/platform/gpu_info.*

* Update
```
  0c43a376
10 3月, 2018 1 次提交
- K
  
  add gpu info func to get compute cap · 1998d5af
  由 Kexin Zhao 提交于 3月 09, 2018
  
  1998d5af
03 3月, 2018 1 次提交
- C
  
  get max threads of GPU · 00e596ed
  由 chengduoZH 提交于 3月 02, 2018
  
  00e596ed
12 2月, 2018 1 次提交
- Q
  
  Fix the grammar in copyright. (#8403) · 24509f4a
  由 qingqing01 提交于 2月 12, 2018
  
  24509f4a
10 2月, 2018 2 次提交
- Y
  
  Correct #include path · fc374821
  由 Yi Wang 提交于 2月 09, 2018
  
  fc374821
- Y
  
  Move file to fluid/; Edit CMakeLists.txt · 90648f33
  由 Yi Wang 提交于 2月 09, 2018
  
  90648f33
10 1月, 2018 2 次提交
- D
  
  "fix CI" · a6edc038
  由 dzhwinter 提交于 1月 09, 2018
  
  a6edc038
- D
  
  "add flags" · f0316bdb
  由 dzhwinter 提交于 1月 09, 2018
  
  f0316bdb
22 12月, 2017 1 次提交

"remove GPU Sync Interface" (#6793) · abde3130

由 dzhwinter 提交于 12月 22, 2017

* "remove GPU Sync Interface"

* "fix typo"

* "fix type cast error"

* "fix related Copy with stream"

* "fix failed tests with DevicePool"

* "fix stupid removed position error"

abde3130

15 12月, 2017 1 次提交
- Y
  
  Simplize system_allocator and fix GPU_INFO (#6653) · 1b0c7d7c
  由 Yu Yang 提交于 12月 15, 2017
  
  1b0c7d7c
05 12月, 2017 1 次提交
- Q
  
  fix bug in gpu default memory allocating policy (#6268) · 96a5f96c
  由 QI JUN 提交于 12月 05, 2017
  
  96a5f96c
01 12月, 2017 1 次提交
- Q
  change GPU memory allocating policy (#6159) · d066b07f
  由 QI JUN 提交于 12月 01, 2017
```
* change GPU memory allocating policy

* fix potential overflow bug
```
  d066b07f
16 11月, 2017 1 次提交
- D
  "fix accuracy kernel bug" (#5673) · e97b8987
  由 dzhwinter 提交于 11月 15, 2017
```
* "fix accuracy kernel bug"

* "relauch ci"
```
  e97b8987
31 10月, 2017 1 次提交
- Q
  remove unused code (#5219) · afd1e844
  由 QI JUN 提交于 10月 30, 2017
```
* remove unused code

* fix cmake file

* fix build error
```
  afd1e844
10 10月, 2017 1 次提交
- Y
  
  clean up for review · e5155713
  由 Yang Yang 提交于 10月 09, 2017
  
  e5155713
07 10月, 2017 1 次提交
- Q
  
  fix executor gpu unittest · 1f5192a2
  由 qijun 提交于 10月 06, 2017
  
  1f5192a2
05 10月, 2017 2 次提交
- Y
  
  Rename platform::GetDeviceCount into platform::GetCUDADeviceCount · 2b204f04
  由 Yi Wang 提交于 10月 04, 2017
  
  2b204f04
- Q
  
  fix gpu build error · fe10e86d
  由 qijun 提交于 10月 04, 2017
  
  fe10e86d
18 8月, 2017 1 次提交
- L
  
  Add ENVIRONMENT interface interface · 55437b58
  由 liaogang 提交于 8月 18, 2017
  
  55437b58
19 7月, 2017 1 次提交
- L
  
  Add cuda memcpy in gpu_info · b0588641
  由 liaogang 提交于 7月 19, 2017
  
  b0588641
15 7月, 2017 1 次提交
- L
  
  ENH: unify PADDLE_ENFORCE · f812de2c
  由 liaogang 提交于 7月 15, 2017
  
  f812de2c
13 7月, 2017 1 次提交
- L
  
  ENH: Remove comments · ff98e3c1
  由 liaogang 提交于 7月 13, 2017
  
  ff98e3c1
11 7月, 2017 1 次提交
- L
  
  FIX: merge conflicts · 383b96f3
  由 liaogang 提交于 7月 11, 2017
  
  383b96f3
06 7月, 2017 1 次提交
- L
  
  ENH: add memory unit test · 74691789
  由 liaogang 提交于 7月 06, 2017
  
  74691789

机器未来 / Paddle 与 Fork 源项目一致

机器未来 / Paddle
与 Fork 源项目一致