提交 · 27ee6e714046e8cf6dd913854da167233f7f7c41 · BaiXuePrincess / Paddle

18 11月, 2022 1 次提交

[PHI decoupling] move "gpu_device_function.h" from fluid to phi (#48097) · 27ee6e71

由 huangjiyi 提交于 11月 18, 2022

* move "paddle/phi/backends/gpu/gpu_device_function.h" to phi

* update copyright years

* rm "fluid/platform/device/gpu/gpu_device_function.h" in phi

* fix rocm-complie bugs

27ee6e71

27 7月, 2022 1 次提交
- Y
  
  [DCU] Fix NAN problem when training BERT on DUC platform (#44643) · 28aa0c61
  由 Yuang Liu 提交于 7月 27, 2022
  
  28aa0c61
26 6月, 2022 1 次提交
- S
  
  format all files in fluid using new config (#43776) · 576236a0
  由 Sing_chan 提交于 6月 26, 2022
  
  576236a0
23 2月, 2022 1 次提交

[bf16] add bf16 kernel: elementwise_div (#39602) · ca4df333

由 zhangbo9674 提交于 2月 23, 2022

* add elementwise_div

* refine rocm

* refine code

* refine op register

* solve conflict

* refine unittest

* refine unittest precision

* add rocm

ca4df333

16 2月, 2022 1 次提交

[bf16] pten matmul cuda kernel support bf16 (#39485) · d5a0d31a

由 Leo Chen 提交于 2月 16, 2022

* pten matmul cuda kernel support bf16

* fix pten kernel name

* add matmul_grad bf16 kernel

* add emptylike bf16 kernel

* fix compile

* suppport rocm

* fix error

* fix rocm

* add bf16 header file

* fix compile

d5a0d31a

12 1月, 2022 1 次提交

Adjust warpper of gpu_lanuch_config (#38654) · f5166284

由 limingshu 提交于 1月 12, 2022

* first commit

* fix wrong filename

* fix the wrong spell name

* fix gpu config warper

* modify according to pr advices

* fix GpuLauchConfig1D api bugs

* change the config for dropout grad

* fix bugs

* modification according to pr advices

* modification according to pr advices

f5166284

03 12月, 2021 1 次提交
- R
  refine structure for cuda and rocm (#37202) · a6d2fddb
  由 ronnywang 提交于 12月 03, 2021
```
* refine structure for cuda and rocm

* update

* update

* update

* update
```
  a6d2fddb
02 6月, 2021 1 次提交
- Q
  
  [ROCM] fix fused_fc_elementwise_layernorm, test=develop (#33281) · 3f366fee
  由 Qi Li 提交于 6月 02, 2021
  
  3f366fee
25 5月, 2021 1 次提交

modify complex template for elementwise ops (#33071) · dbc08d69

由 chentianyu03 提交于 5月 25, 2021

* modify complex template for elementwise ops

* modify mul, div grad struct

* add complex template for CudaShuffleDownSync CudaShuffleXorSync funcs and fix the bug when delete cuda<9000

* fix shuffle func args bug

* fix shuffle func args bug

* fix shuffle func args bug

dbc08d69

31 3月, 2021 1 次提交
- T
  
  delete cuda9 code (#31883) · ea738dda
  由 tianshuo78520a 提交于 3月 31, 2021
  
  ea738dda
08 2月, 2021 1 次提交
- Q
  
  [ROCM] update fluid platform for rocm39 (part3), test=develop (#30913) · 93c1d9e7
  由 Qi Li 提交于 2月 08, 2021
  
  93c1d9e7
01 12月, 2020 1 次提交

add complex64 and complex128 type; add +-*/@ and slice opreator for c… (#29199) · 8f45d142

由 chentianyu03 提交于 12月 01, 2020

* add complex64 and complex128 type; add +-*/@ and slice opreator for complex types

* add test cases for complex elementwise, matmul and getitem unittest

* add test cases for complex types

* add test cases for complex matmul unittest

8f45d142

08 5月, 2019 1 次提交

Refine elementwise kernel. (#16952) · 792443ef

由 zhaoyuchen2018 提交于 5月 08, 2019

* Refine elementwise kernel.

Add a simple cuda kernel if grad x and y both exist
Use 2D block cuda kernel to do broadcast.

test=develop
Signed-off-by: Nzhaoyuchen <zhaoyuchen01@baidu.com>

* refine code.

test=develop
Signed-off-by: Nzhaoyuchen <zhaoyuchen01@baidu.com>

* refine code.

test=develop
Signed-off-by: Nzhaoyuchen <zhaoyuchen01@baidu.com>

792443ef

31 1月, 2019 1 次提交
- G
  To make CUDA_LAUNCH_KERNEL_HELPER support large size. · 5dfce931
  由 guoshengCS 提交于 1月 31, 2019
```
test=develop
```
  5dfce931
24 1月, 2019 1 次提交

Add the CUDA kernel for beam_search op (#15020) · 3008fa12

由 Yiqun Liu 提交于 1月 24, 2019

* Refine the beam_search op and test.

* A basic CUDA implementation of beam_search for small batch_size.

* Implement CUDA kernel for beam_search_op.

* Use multiple CUDA threads in the same block to select the top beam.

* Update the python api of beam_search op.

* Enable extend function in CPU kernel of beam_search op.

* Unify the CUDA codes.
test=develop

* Unify the CPU kernel of beam_search op.

* Ensure the seletced items of beam_search_op's CPU kernel sorted by scores.

* Update the description of beam_search in API.spec.

* Enable the use of CUDA kernel in beam_search op.

* Exclude the beam_search's CUDA unittest when there is no CUDA gpu, and delete some debuging statements.
test=develop

* Follow comments.
test=develop

* Call the CPU kernel for beam_search op when batch_size > 4.
test=develop

* Remove the except of is_empty op in PrepareData.
test=develop

3008fa12

17 8月, 2018 1 次提交
- D
  "fix float16 ShuffleDownSync Bug" (#12756) · 2673798d
  由 dzhwinter 提交于 8月 17, 2018
```
* "fix bug"

* "add test case"
```
  2673798d
30 7月, 2018 1 次提交
- D
  float16 type support enhance (#12181) · 39ac9e39
  由 dzhwinter 提交于 7月 30, 2018
```
* cherry picked

* "cherry picked platform"

* "add comment"

* "fix ci"
```
  39ac9e39
08 5月, 2018 1 次提交
- C
  
  add sync · 345737d0
  由 chengduoZH 提交于 5月 08, 2018
  
  345737d0
04 5月, 2018 1 次提交
- C
  
  wrap_shfl_x_sync · d36af62c
  由 chengduoZH 提交于 5月 03, 2018
  
  d36af62c
03 5月, 2018 2 次提交
- C
  
  fix __shfl · e97c1a8c
  由 chengduoZH 提交于 5月 03, 2018
  
  e97c1a8c
- C
  Fix __shfl_down_sync_ of cross_entropy (#10345) · 4fbde42c
  由 chengduo 提交于 5月 03, 2018
```
* fix __shfl_down_sync_ of cross_entropy

* use reduceSum

* "fix ci"
```
  4fbde42c

BaiXuePrincess / Paddle 与 Fork 源项目一致

BaiXuePrincess / Paddle
与 Fork 源项目一致