提交 · 8c0bacd45eb385941f6d8663a500aa6c46b0a038 · BaiXuePrincess / Paddle

25 10月, 2021 1 次提交

Add fused_attention_op: add impl wrappers. (#35903) (#36673) · 8c0bacd4

由 Li Min 提交于 10月 25, 2021

功能：本PR的目标是提高attention模块的计算性能。
为了减少框架层对op的调度开销，本PR通过在C++层手动实现attention模块，对外提供attention 大op；
为了减少防存开销，本PR采取了两种优化方法：
（1）在q,k,v计算时通过共享输入X，将该处的gemm，transpose和bias add从三次调用减少为一次；
（2）使用kernel融合优化技术，在不同cuda kernel之间通过寄存器传输数据；

8c0bacd4

08 9月, 2021 1 次提交

fix the bug of layer_norm when batch_size=1 (#35480) · ad5f7494

由 zhangkaihuo 提交于 9月 08, 2021

The bug is that access to mean and var is incorrect, and the array will be out of bounds: the shape of mean and var is [batch_size], and the range of thread idx is 0~feature_size, so mean[idx] and var[idx] is incorrect.

When batch_size=1, the correct access is mean[0] and var[0], and a unit test with batch_size=1 is added.

ad5f7494

23 8月, 2021 1 次提交

Refactor the organization of layer_norm cuda impl. (#34883) · 7f5eb533

由 Li Min 提交于 8月 23, 2021

Refactor the organization of layer_norm cuda impl so that it can be reused in fused attention op.

Extract the layer_norm cuda impl form layer_norm_op.cu to layer_norm_kernel.cu.h.
Define fused/attention_layer_norm.h, which can be used in fused attention op in next PR.

7f5eb533

24 6月, 2021 1 次提交
- L
  
  fix bug when the cuda kernel config exceeds dims max (#33748) · 56692f66
  由 Leo Chen 提交于 6月 24, 2021
  
  56692f66
22 6月, 2021 1 次提交
- Z
  
  fix gpt2 train loss Nan problem (#33658) · 687571f2
  由 zhiboniu 提交于 6月 22, 2021
  
  687571f2
15 6月, 2021 1 次提交
- S
  1, remove layernorm dynamic fp16; 2, let reshape out in dynamic shape (#33535) · c5a6ae4c
  由 Shang Zhizhou 提交于 6月 15, 2021
```
* 1, remove layernorm dynamic fp16; 2, let reshape out in dynamic shape

* remove useless code
```
  c5a6ae4c
12 6月, 2021 1 次提交

Fix LayerNorm Problem (#33420) · fe94db6c

由 zhiboniu 提交于 6月 12, 2021

* Eliminate numerical differences of LayerNorm; fix LayerNorm Nan Bug while large data input

* fix bug while large shape of data input

fe94db6c

08 6月, 2021 1 次提交

add dynamic layer_norm plugin (#33293) · 45d1ae21

由 Shang Zhizhou 提交于 6月 08, 2021

* add dynamic layer_norm plugin

* fix bug

* fix numpy.allclose

* fix format

* fix code style

* remove shepe in dynamic shape

* code format

* remove layer norm fp16

* fix format

45d1ae21

19 3月, 2021 1 次提交
- R
  
  [ROCM] fix layer_norm, norm, p_norm, test_sequence_softmax_op, test_math_op_patch_var_base (#31709) · 420527f0
  由 ronnywang 提交于 3月 19, 2021
  
  420527f0
02 3月, 2021 1 次提交
- Q
  
  [ROCM] update fluid operators for rocm (part8), test=develop (#31309) · 59940cb3
  由 Qi Li 提交于 3月 02, 2021
  
  59940cb3
15 1月, 2021 1 次提交
- Y
  Fix float64 bug in layer norm (#30452) · 008b0a8b
  由 Yang Zhang 提交于 1月 15, 2021
```
built-in `rsqrt` is shadowed
```
  008b0a8b
14 12月, 2020 1 次提交
- L
  Fix compile problem when cuda_arch < 6000 (#29576) · c0163837
  由 Leo Chen 提交于 12月 14, 2020
```
* fix compile problem when cuda_arch < 6000

* refine code

* refine code
```
  c0163837
10 12月, 2020 1 次提交

Layernorm opt (#29522) · 9f926eb7

由 Leo Chen 提交于 12月 10, 2020

* layernorm fw opt

* layernorm bw opt

* fix typo, test=develop

* remove const dim3 for windows CI compatibility

* merge develop
Co-authored-by: Nzlsh80826 <zlsh80826@gmail.com>

9f926eb7

07 12月, 2020 1 次提交
- L
  
  fix layer_norm accuracy (#29434) · a040c055
  由 Leo Chen 提交于 12月 07, 2020
  
  a040c055
02 12月, 2020 1 次提交

Layer norm fp16 (#29169) · 7584bb50

由 furnace 提交于 12月 02, 2020

* add fp16 for layer_norm op

* revert layernorm api

* fix forward

* fix forward

* fix backward for layernorm with fp16

* fix unit test for layernorm with fp16

* fix with_mkldnn compile error for layernorm with fp16

* 1. revert to PADDLE_ENFORCE_NOT_NULL, 2. change static_cast<float> to static_cast<U>

* fix with_mkldnn compile error for layernorm with fp16

* fix with_mkldnn compile error for layernorm with fp16
Co-authored-by: Nzhiqiu <chenqiuliang@baidu.com>

7584bb50

14 5月, 2020 1 次提交
- L
  API/OP (group_norm, layer_norm, random_crop, unpool) error message enhancement (#24413) · 9f83f0fe
  由 lijianshe02 提交于 5月 14, 2020
```
* API/OP (group_norm, layer_norm, unpool) error message enhancement test=develop
```
  9f83f0fe
20 4月, 2020 1 次提交
- M
  restrict block num of layer_norm_grad cuda block to 128 (#23878) · 7d4002e0
  由 mapingshuo 提交于 4月 20, 2020
```
restrict block num of layer_norm_grad cuda kernel to 128, test=develop
```
  7d4002e0
06 1月, 2020 1 次提交

Add TRT support for BERT (#21135) · 0a51098a

由 Pei Yang 提交于 1月 06, 2020

* add gelu plugin

* align trt bert with gpu

* add support for fused fc with relu,

* add unittest for bert trt

0a51098a

05 9月, 2018 1 次提交
- Y
  
  Use double to reduce · f57d706a
  由 Yu Yang 提交于 9月 05, 2018
  
  f57d706a
08 8月, 2018 1 次提交
- S
  
  refine layer_norm · ad45d392
  由 sneaxiy 提交于 8月 06, 2018
  
  ad45d392
12 2月, 2018 1 次提交
- Q
  
  Fix the grammar in copyright. (#8403) · 24509f4a
  由 qingqing01 提交于 2月 12, 2018
  
  24509f4a
10 2月, 2018 2 次提交
- Y
  
  Correct #include path · fc374821
  由 Yi Wang 提交于 2月 09, 2018
  
  fc374821
- Y
  
  Move file to fluid/; Edit CMakeLists.txt · 90648f33
  由 Yi Wang 提交于 2月 09, 2018
  
  90648f33
03 2月, 2018 2 次提交
- C
  
  unifid GPU and CPU implementation · e0333735
  由 chengduoZH 提交于 2月 03, 2018
  
  e0333735
- C
  
  Add layer norm [GPU] · 76e188e5
  由 chengduoZH 提交于 2月 02, 2018
  
  76e188e5

BaiXuePrincess / Paddle 与 Fork 源项目一致

BaiXuePrincess / Paddle
与 Fork 源项目一致