提交 · 99cfcc0972359922acdd28830b86335195881d5b · PaddlePaddle / Paddle

29 1月, 2022 1 次提交

Optimize layer norm backward cuda kernel when cols is 1024. (#39247) · 99cfcc09

由 Li Min 提交于 1月 29, 2022

* Add fp16 support for scale/bias for fused_layernnorm_residual_dropout_bias op.

* Remove useless code.

* Remove useless code.

* Optimize layer_norm fwd when cols is 1024.

* Remove useless code.

* Minors.

* Minors.

* Modifications accordding to reviews.

* Minors.

* Optimize layer_norm bwd kernel when cols is 1024.

* Polish layer_norm_bwd_1024 kernel.

* Limit ln_bwd_1024_kernel to paddle_with_cuda.

* Fix double type compile error.

* Add optimization of ln bwd for fused_dropout_add_ln op.

* Polish codes.

99cfcc09

26 1月, 2022 1 次提交
- L
  Optimize layer norm forward when cols is 1024. (#39167) · 01d04be6
  由 Li Min 提交于 1月 26, 2022
```
* Optimize layer_norm fwd when cols is 1024.
```
  01d04be6
17 12月, 2021 1 次提交

Refine some AMP operators for BERT (#37923) · d80fe268

由 sneaxiy 提交于 12月 17, 2021

* support multi precision update for LAMB

* hide some api

* fix ci uts

* fix lamb output of dygraph

* remove some changes to some PR

* try to fix Py3 CI compile error

* fix test_imperative_optimizer, add lars ut, add layer_norm ut

* fix ut, fix format

* fix ut

* fix windows ci

d80fe268

03 12月, 2021 1 次提交
- R
  refine structure for cuda and rocm (#37202) · a6d2fddb
  由 ronnywang 提交于 12月 03, 2021
```
* refine structure for cuda and rocm

* update

* update

* update

* update
```
  a6d2fddb
23 9月, 2021 1 次提交
- L
  
  Add fused_attention_op: add impl wrappers. (#35903) · 88ea8e6f
  由 Li Min 提交于 9月 23, 2021
  
  88ea8e6f
08 9月, 2021 1 次提交

fix the bug of layer_norm when batch_size=1 (#35480) · ad5f7494

由 zhangkaihuo 提交于 9月 08, 2021

The bug is that access to mean and var is incorrect, and the array will be out of bounds: the shape of mean and var is [batch_size], and the range of thread idx is 0~feature_size, so mean[idx] and var[idx] is incorrect.

When batch_size=1, the correct access is mean[0] and var[0], and a unit test with batch_size=1 is added.

ad5f7494

23 8月, 2021 1 次提交

Refactor the organization of layer_norm cuda impl. (#34883) · 7f5eb533

由 Li Min 提交于 8月 23, 2021

Refactor the organization of layer_norm cuda impl so that it can be reused in fused attention op.

Extract the layer_norm cuda impl form layer_norm_op.cu to layer_norm_kernel.cu.h.
Define fused/attention_layer_norm.h, which can be used in fused attention op in next PR.

7f5eb533

24 6月, 2021 1 次提交
- L
  
  fix bug when the cuda kernel config exceeds dims max (#33748) · 56692f66
  由 Leo Chen 提交于 6月 24, 2021
  
  56692f66
22 6月, 2021 1 次提交
- Z
  
  fix gpt2 train loss Nan problem (#33658) · 687571f2
  由 zhiboniu 提交于 6月 22, 2021
  
  687571f2
15 6月, 2021 1 次提交
- S
  1, remove layernorm dynamic fp16; 2, let reshape out in dynamic shape (#33535) · c5a6ae4c
  由 Shang Zhizhou 提交于 6月 15, 2021
```
* 1, remove layernorm dynamic fp16; 2, let reshape out in dynamic shape

* remove useless code
```
  c5a6ae4c
12 6月, 2021 1 次提交

Fix LayerNorm Problem (#33420) · fe94db6c

由 zhiboniu 提交于 6月 12, 2021

* Eliminate numerical differences of LayerNorm; fix LayerNorm Nan Bug while large data input

* fix bug while large shape of data input

fe94db6c

08 6月, 2021 1 次提交

add dynamic layer_norm plugin (#33293) · 45d1ae21

由 Shang Zhizhou 提交于 6月 08, 2021

* add dynamic layer_norm plugin

* fix bug

* fix numpy.allclose

* fix format

* fix code style

* remove shepe in dynamic shape

* code format

* remove layer norm fp16

* fix format

45d1ae21

19 3月, 2021 1 次提交
- R
  
  [ROCM] fix layer_norm, norm, p_norm, test_sequence_softmax_op, test_math_op_patch_var_base (#31709) · 420527f0
  由 ronnywang 提交于 3月 19, 2021
  
  420527f0
02 3月, 2021 1 次提交
- Q
  
  [ROCM] update fluid operators for rocm (part8), test=develop (#31309) · 59940cb3
  由 Qi Li 提交于 3月 02, 2021
  
  59940cb3
15 1月, 2021 1 次提交
- Y
  Fix float64 bug in layer norm (#30452) · 008b0a8b
  由 Yang Zhang 提交于 1月 15, 2021
```
built-in `rsqrt` is shadowed
```
  008b0a8b
14 12月, 2020 1 次提交
- L
  Fix compile problem when cuda_arch < 6000 (#29576) · c0163837
  由 Leo Chen 提交于 12月 14, 2020
```
* fix compile problem when cuda_arch < 6000

* refine code

* refine code
```
  c0163837
10 12月, 2020 1 次提交

Layernorm opt (#29522) · 9f926eb7

由 Leo Chen 提交于 12月 10, 2020

* layernorm fw opt

* layernorm bw opt

* fix typo, test=develop

* remove const dim3 for windows CI compatibility

* merge develop
Co-authored-by: Nzlsh80826 <zlsh80826@gmail.com>

9f926eb7

07 12月, 2020 1 次提交
- L
  
  fix layer_norm accuracy (#29434) · a040c055
  由 Leo Chen 提交于 12月 07, 2020
  
  a040c055
02 12月, 2020 1 次提交

Layer norm fp16 (#29169) · 7584bb50

由 furnace 提交于 12月 02, 2020

* add fp16 for layer_norm op

* revert layernorm api

* fix forward

* fix forward

* fix backward for layernorm with fp16

* fix unit test for layernorm with fp16

* fix with_mkldnn compile error for layernorm with fp16

* 1. revert to PADDLE_ENFORCE_NOT_NULL, 2. change static_cast<float> to static_cast<U>

* fix with_mkldnn compile error for layernorm with fp16

* fix with_mkldnn compile error for layernorm with fp16
Co-authored-by: Nzhiqiu <chenqiuliang@baidu.com>

7584bb50

14 5月, 2020 1 次提交
- L
  API/OP (group_norm, layer_norm, random_crop, unpool) error message enhancement (#24413) · 9f83f0fe
  由 lijianshe02 提交于 5月 14, 2020
```
* API/OP (group_norm, layer_norm, unpool) error message enhancement test=develop
```
  9f83f0fe
20 4月, 2020 1 次提交
- M
  restrict block num of layer_norm_grad cuda block to 128 (#23878) · 7d4002e0
  由 mapingshuo 提交于 4月 20, 2020
```
restrict block num of layer_norm_grad cuda kernel to 128, test=develop
```
  7d4002e0
06 1月, 2020 1 次提交

Add TRT support for BERT (#21135) · 0a51098a

由 Pei Yang 提交于 1月 06, 2020

* add gelu plugin

* align trt bert with gpu

* add support for fused fc with relu,

* add unittest for bert trt

0a51098a

05 9月, 2018 1 次提交
- Y
  
  Use double to reduce · f57d706a
  由 Yu Yang 提交于 9月 05, 2018
  
  f57d706a
08 8月, 2018 1 次提交
- S
  
  refine layer_norm · ad45d392
  由 sneaxiy 提交于 8月 06, 2018
  
  ad45d392
12 2月, 2018 1 次提交
- Q
  
  Fix the grammar in copyright. (#8403) · 24509f4a
  由 qingqing01 提交于 2月 12, 2018
  
  24509f4a
10 2月, 2018 2 次提交
- Y
  
  Correct #include path · fc374821
  由 Yi Wang 提交于 2月 09, 2018
  
  fc374821
- Y
  
  Move file to fluid/; Edit CMakeLists.txt · 90648f33
  由 Yi Wang 提交于 2月 09, 2018
  
  90648f33
03 2月, 2018 2 次提交
- C
  
  unifid GPU and CPU implementation · e0333735
  由 chengduoZH 提交于 2月 03, 2018
  
  e0333735
- C
  
  Add layer norm [GPU] · 76e188e5
  由 chengduoZH 提交于 2月 02, 2018
  
  76e188e5

PaddlePaddle / Paddle 大约 1 年 前同步成功

PaddlePaddle / Paddle
大约 1 年前同步成功