提交 · 5664ea26a0c2ed61bca5857877a3bc6ef0a1d01c · PaddlePaddle / Paddle

13 4月, 2023 1 次提交

[enforce.h Decouple logging.h] Delete glog/logging.h from enforce.h (#52651) · 5664ea26

由 HongyuJia 提交于 4月 13, 2023

* [enforce.h Decouple logging.h] Delete glog/logging.h from enforce.h

* Add logging.h for profiler.cc

* Add logging.h for gloo_utils.h

* Add logging.h for addmm_kernel_impl.h

* Add logging.h for addmm_grad_kernel_impl.h

* Add logging.h for p_send_kernel.cu

* Add logging.h for determinant_grad_kernel_impl.h

* Add logging.h for p_recv_kernel.cu

* Add logging.h for elementwise_grad_base.h

* Add logging.h for transfer_layout_kernel.cc

* Add logging.h for eigvals_kernel.cc and index_select_impl.h

* Add logging.h for all files in kernel directory

* Add logging.h for xpu_info.cc

* Add logging.h for xpu

5664ea26

06 4月, 2023 1 次提交

Move fused_attention op to phi [迁移前向 GPU OpKernel] (#51743) · a7ec8958

由 Sonder 提交于 4月 06, 2023

* add kernel functions

* update kernel functions

* update func parameters' name

* create codes for gpu device

* 调整文件位置

* fix include error

* remove dependent files to phi/

* restore fused_attention_op.cu

* fix dependence errors

* fix dependence errors

* fix include error

* fix all depandence errors[build success]

* remove useless include

* recover useless include

* use phi::ToNCCLDataType

* fix namespace

* update new register code

* fix error in fused_gemm_epilogue_utils

* fix error in FusedAttentionKernel parm

* finish fused_attention registe code[build success]

* add paddle::optional

* add sig file

* fix build error

* fix a include error

* update CMkaeList

* fix parameter sequence

* add include file

* update #if before include

* fix grammly error

* update codes for DropoutParam

* remove const cast

* trans some fluid api to phi api

* add #if

* update test code

* update test codes

* recover test codes

* trans fused_attention to fluid

* move #endif to end

* move #endif

* delete useless files

* use fused attention utils and recover random seed

* remove fluid include in phi

a7ec8958

14 3月, 2023 1 次提交

Optimization for layerNormGrad [Part1] (#51282) · 7a3d05d9

由 limingshu 提交于 3月 14, 2023

* first commit

* fix code bugs in for_loop

* fix bugs in cuLoadAddStridedInputs.

* optimization for LayerNormBackwardComputeGradInput

* add unitest for validating the optimization

* fix windows ci error

7a3d05d9

21 2月, 2023 1 次提交

[PHI Decoupling]Remove memory header (Part1) (#50419) · 1cfcb71d

由 YuanRisheng 提交于 2月 21, 2023

* decouple_memory

* perfect memory utils

* fix ci bugs

* fix inference bugs

* fix custom test bugs

* fix converage bugs

* modify code according comment

* modify namespace

* deal with compile bugs

1cfcb71d

16 2月, 2023 1 次提交

[Phi decouple] move layer_norm_kernel.cu.h to phi (#50506) · 8910bb4a

由 Huang Jiyi 提交于 2月 16, 2023

* move layer_norm_kernel.cu.h to phi

* fix bugs

* fix namespace

* fix bugs

* fix CI-Windwos

* replace mutable_data

* fix bugs

* fix bugs

8910bb4a

13 12月, 2022 1 次提交

Save fused_attention op memory when dropout_rate = 0.0 (#48902) · 428fb804

由 sneaxiy 提交于 12月 13, 2022

* save fused_attention memory when dropout_rate = 0.0

* add ut

* fix ut bug

* fix fused_layernorm_residual_dropout_bias_test.cu

428fb804

07 12月, 2022 1 次提交
- 张
  
  [phi::DenseTensor] Replace Tensor with phi::DenseTensor (#48682) · 65420271
  由张春乔提交于 12月 07, 2022
  
  65420271
22 11月, 2022 1 次提交

[PHI decoupling] remove "gpu_device_function.h" in fluid. (#48117) · 4da1a0fe

由 huangjiyi 提交于 11月 22, 2022

* move "paddle/phi/backends/gpu/gpu_device_function.h" to phi

* update copyright years

* rm "fluid/platform/device/gpu/gpu_device_function.h" in phi

* rm dependence to "gpu_device_function.h" in fluid

* rm gpu_device_function.h etc in fluid

* fix rocm-complie bugs

* fix cuda_helper_test.cu bugs

4da1a0fe

28 9月, 2022 1 次提交

Remove the declaration of using Tensor in framework/tensor.h (#46432) · e12a905e

由 Chen Weihang 提交于 9月 28, 2022

* remove needless using tensor

* remove needless using tensor

* resolve conflict

* replace tensor using

* fix format error

* revert needless changing

* fix rocm and npu compile error

* fix cinn compile error

* fix format error

* fix mkldnn format error

* fix mkldnn format error

* fix cinn compile error

* fix cinn compile error

* fix cinn compile error

* resolve conflict

e12a905e

21 9月, 2022 1 次提交

add layer_norm trt fp16 support (#45043) · b7a1ae22

由 ccrrong 提交于 9月 21, 2022

* add fp16 support

* update

* update half

* code format

* fix unittest

* fix rocm compile error

* code format

* code format

* fix rocm compile error

* fix rocm compile error

b7a1ae22

18 9月, 2022 1 次提交
- R
  
  Add INT8 support for fused_multi_transformer_op (#45284) · 3d7e2118
  由 RichardWooSJTU 提交于 9月 18, 2022
  
  3d7e2118
01 9月, 2022 1 次提交
- L
  remove circular dependency of device_context and allocator (#45455) · 934171ae
  由 Leo Chen 提交于 9月 01, 2022
```
* refine cmake of framework

* add deps for dense tensor

* fix deps

* remove alloc(ctx)

* add depends on mkldnn
```
  934171ae
07 7月, 2022 1 次提交
- Z
  Fix nan in fast_ln_fwd_kernel when cols > 1024 (#44125) · 33540e10
  由 Zhang Zheng 提交于 7月 07, 2022
```
* Fix nan in fast_ln_fwd_kernel when cols > 1024

* delete blas
```
  33540e10
26 6月, 2022 1 次提交
- S
  
  format all files in fluid using new config (#43776) · 576236a0
  由 Sing_chan 提交于 6月 26, 2022
  
  576236a0
10 6月, 2022 1 次提交
- L
  
  optimize bwd layer_norm kernel with fast method (#42491) · b4a93884
  由 limingshu 提交于 6月 10, 2022
  
  b4a93884
05 6月, 2022 1 次提交
- S
  
  【code format check upgrade】 step2：clang-format (#42840) · a3730dc8
  由 Sing_chan 提交于 6月 05, 2022
  
  a3730dc8
02 6月, 2022 1 次提交
- L
  Extend forward fast layer_norm kernel to support more dimensions. (#43118) · 85baa3c0
  由 Li Min 提交于 6月 02, 2022
```
* extend forward fast_ln_kernel to support more column values.
```
  85baa3c0
19 4月, 2022 1 次提交
- S
  
  fix ln (#41918) · 771a4144
  由 sneaxiy 提交于 4月 19, 2022
  
  771a4144
17 3月, 2022 1 次提交

Move layer norm to phi (#40193) · 681a6865

由 hong 提交于 3月 17, 2022

* update

* fix bugs; test=develop

* update; test=develop

* fix test compile error; test=develop

* fix cpu compile error; test=develop

* fix test error; test=develo

* fix layer_norm_op plugin error; test=develop

* fix error; test=develop

* fix test bug; test=develop

* update; test=develop

* polish code; test=develop

* fix bugs; test=develop

* remove unused depency; test=develop

* polish code; test=develop

681a6865

04 3月, 2022 1 次提交
- L
  clean distribution_helper, index_impl, aligned_vector code in fluid (#40071) · b9672a1e
  由 Leo Chen 提交于 3月 04, 2022
```
* clean distribution_helper, index_impl, aligned_vector code in fluid

* fix conflicts
```
  b9672a1e
01 3月, 2022 1 次提交

[bf16] add bf16 kernel: layer_norm p_norm reduce_sum (#39843) · ce8ed978

由 zhangbo9674 提交于 3月 01, 2022

* add layer norm

* add p norm

* add reduce sum

* refine layer norm register bf16 for cudnn811

* add bf16 cast for hip

* add unittest

* refine rocm

* refine layer_norm unittest

* refine reduce op

* refine unittest

* enhance atol for reduce unittest

ce8ed978

20 2月, 2022 1 次提交

[PTen->Phi PR1] Change pten dirname and namespace to phi (#39748) · dcfe1986

由 Chen Weihang 提交于 2月 20, 2022

* rename pten dir to phi

* rename namespace to phi

* rename infrt pten dir to phi

* resolve conflict

* rename pten to phi in cmake

* revert all infrt change

* change needed files

* fix infrt failed

* fix inference failed

dcfe1986

19 2月, 2022 1 次提交

[Pten]Unify paddle/pten::framework::ddim into pten::ddim (#39614) · 2fe04264

由 Aurelius84 提交于 2月 19, 2022

* Unify paddle/pten::framework::ddim into pten::ddim

* fix paddle namespace

* compile sucessfully

* fix npu src file

* fix conflict

* fix conflict

* fix tensorrt compiler error

* fix conflict

* fix conflict

* fix tesst file conflict

* fix conflict

* fix mlu file conflict

* fix mlu file conflict

* fix cinn header file conflict

* fix conflict

* fix conflict

* fix conflict

* fix conflict

2fe04264

29 1月, 2022 1 次提交

Optimize layer norm backward cuda kernel when cols is 1024. (#39247) · 99cfcc09

由 Li Min 提交于 1月 29, 2022

* Add fp16 support for scale/bias for fused_layernnorm_residual_dropout_bias op.

* Remove useless code.

* Remove useless code.

* Optimize layer_norm fwd when cols is 1024.

* Remove useless code.

* Minors.

* Minors.

* Modifications accordding to reviews.

* Minors.

* Optimize layer_norm bwd kernel when cols is 1024.

* Polish layer_norm_bwd_1024 kernel.

* Limit ln_bwd_1024_kernel to paddle_with_cuda.

* Fix double type compile error.

* Add optimization of ln bwd for fused_dropout_add_ln op.

* Polish codes.

99cfcc09

26 1月, 2022 1 次提交
- L
  Optimize layer norm forward when cols is 1024. (#39167) · 01d04be6
  由 Li Min 提交于 1月 26, 2022
```
* Optimize layer_norm fwd when cols is 1024.
```
  01d04be6
17 12月, 2021 1 次提交

Refine some AMP operators for BERT (#37923) · d80fe268

由 sneaxiy 提交于 12月 17, 2021

* support multi precision update for LAMB

* hide some api

* fix ci uts

* fix lamb output of dygraph

* remove some changes to some PR

* try to fix Py3 CI compile error

* fix test_imperative_optimizer, add lars ut, add layer_norm ut

* fix ut, fix format

* fix ut

* fix windows ci

d80fe268

03 12月, 2021 1 次提交
- R
  refine structure for cuda and rocm (#37202) · a6d2fddb
  由 ronnywang 提交于 12月 03, 2021
```
* refine structure for cuda and rocm

* update

* update

* update

* update
```
  a6d2fddb
23 9月, 2021 1 次提交
- L
  
  Add fused_attention_op: add impl wrappers. (#35903) · 88ea8e6f
  由 Li Min 提交于 9月 23, 2021
  
  88ea8e6f
08 9月, 2021 1 次提交

fix the bug of layer_norm when batch_size=1 (#35480) · ad5f7494

由 zhangkaihuo 提交于 9月 08, 2021

The bug is that access to mean and var is incorrect, and the array will be out of bounds: the shape of mean and var is [batch_size], and the range of thread idx is 0~feature_size, so mean[idx] and var[idx] is incorrect.

When batch_size=1, the correct access is mean[0] and var[0], and a unit test with batch_size=1 is added.

ad5f7494

23 8月, 2021 1 次提交

Refactor the organization of layer_norm cuda impl. (#34883) · 7f5eb533

由 Li Min 提交于 8月 23, 2021

Refactor the organization of layer_norm cuda impl so that it can be reused in fused attention op.

Extract the layer_norm cuda impl form layer_norm_op.cu to layer_norm_kernel.cu.h.
Define fused/attention_layer_norm.h, which can be used in fused attention op in next PR.

7f5eb533

24 6月, 2021 1 次提交
- L
  
  fix bug when the cuda kernel config exceeds dims max (#33748) · 56692f66
  由 Leo Chen 提交于 6月 24, 2021
  
  56692f66
22 6月, 2021 1 次提交
- Z
  
  fix gpt2 train loss Nan problem (#33658) · 687571f2
  由 zhiboniu 提交于 6月 22, 2021
  
  687571f2
15 6月, 2021 1 次提交
- S
  1, remove layernorm dynamic fp16; 2, let reshape out in dynamic shape (#33535) · c5a6ae4c
  由 Shang Zhizhou 提交于 6月 15, 2021
```
* 1, remove layernorm dynamic fp16; 2, let reshape out in dynamic shape

* remove useless code
```
  c5a6ae4c
12 6月, 2021 1 次提交

Fix LayerNorm Problem (#33420) · fe94db6c

由 zhiboniu 提交于 6月 12, 2021

* Eliminate numerical differences of LayerNorm; fix LayerNorm Nan Bug while large data input

* fix bug while large shape of data input

fe94db6c

08 6月, 2021 1 次提交

add dynamic layer_norm plugin (#33293) · 45d1ae21

由 Shang Zhizhou 提交于 6月 08, 2021

* add dynamic layer_norm plugin

* fix bug

* fix numpy.allclose

* fix format

* fix code style

* remove shepe in dynamic shape

* code format

* remove layer norm fp16

* fix format

45d1ae21

19 3月, 2021 1 次提交
- R
  
  [ROCM] fix layer_norm, norm, p_norm, test_sequence_softmax_op, test_math_op_patch_var_base (#31709) · 420527f0
  由 ronnywang 提交于 3月 19, 2021
  
  420527f0
02 3月, 2021 1 次提交
- Q
  
  [ROCM] update fluid operators for rocm (part8), test=develop (#31309) · 59940cb3
  由 Qi Li 提交于 3月 02, 2021
  
  59940cb3
15 1月, 2021 1 次提交
- Y
  Fix float64 bug in layer norm (#30452) · 008b0a8b
  由 Yang Zhang 提交于 1月 15, 2021
```
built-in `rsqrt` is shadowed
```
  008b0a8b
14 12月, 2020 1 次提交
- L
  Fix compile problem when cuda_arch < 6000 (#29576) · c0163837
  由 Leo Chen 提交于 12月 14, 2020
```
* fix compile problem when cuda_arch < 6000

* refine code

* refine code
```
  c0163837
10 12月, 2020 1 次提交

Layernorm opt (#29522) · 9f926eb7

由 Leo Chen 提交于 12月 10, 2020

* layernorm fw opt

* layernorm bw opt

* fix typo, test=develop

* remove const dim3 for windows CI compatibility

* merge develop
Co-authored-by: Nzlsh80826 <zlsh80826@gmail.com>

9f926eb7

PaddlePaddle / Paddle 大约 1 年 前同步成功

PaddlePaddle / Paddle
大约 1 年前同步成功