提交 · a4689c9086e749185f85908824ea4c8719572c48 · PaddlePaddle / Paddle

01 3月, 2023 3 次提交

Integration flash attention (#49869) · 61611786

由 Chitsing KUI 提交于 3月 01, 2023

* flash attn

* seed

* almost

* softmax

* fix workspace

* add unitest; linux only

* fix setup

* fix datatype include

* fix setup typo

* fix def scope

* new error api

* use paddle fork

* fix attr bug; complete ut

* update flash hash

* fix rng reset

* fix offset

* fix comments

61611786

W

fix the backward bug of cumsum (#50997) · 934934d8
由 wawltor 提交于 3月 01, 2023

934934d8
N

Add multiprecision for rms op (#50132) · 48060b2e
由 niuliling123 提交于 3月 01, 2023

48060b2e

27 2月, 2023 3 次提交

张

【Hackathon No.68】Remove utils in phi (#50833) · 6c181d1d

由张春乔提交于 2月 27, 2023

* remove utils

* remove utils

* remove utils

* remove utils

* Update get_data_from_tensor.h

* Update rnn_functor.h

* Update rnn_grad_kernel.cu.cc

* Update rnn_kernel.cu.cc

* Update rnn_kernel.cc

* Update rnn_grad_kernel.cu.cc

* Update rnn_functor.h

* Update rnn_kernel.cu.cc

* Update rnn_kernel.cc

* remove utils

* Update rnn_functor.h

* remove utils

* remove utils

* remove utils

* remove utils

* remove utils

* Update rnn_functor.h

* Update unsqueeze_op.h

* Update utils.h

* roll back

* Update tensor_utils.h

* Update tensor_utils.h

* Update tensor_utils.h

* Update tensor_utils.h

* Update tensor_utils.h

* use TensorToVector

* use TensorToVector

* use TensorToVector

* use TensorToVector

* use TensorToVector

* Update rnn_kernel.cc

* Update rnn_grad_kernel.cc

* Update rnn_functor.h

* Update rnn_grad_kernel.cu.cc

* Update rnn_kernel.cu.cc

* Update rnn_functor.h

* Update rnn_grad_kernel.cu.cc

* Update rnn_kernel.cu.cc

* Update rnn_functor.h

* Update rnn_grad_kernel.cu.cc

* Update rnn_kernel.cu.cc

* add TensorToVector

* roll back

* Update tensor_utils.h

* Update rnn_functor.h

* Update rnn_grad_kernel.cu.cc

* Update tensor_utils.h

* Update rnn_kernel.cu.cc

* Update rnn_grad_kernel.cc

* Update rnn_kernel.cc

* Update rnn_grad_kernel.cu.cc

* Update rnn_kernel.cu.cc

* Update rnn_grad_kernel.cc

* Update rnn_kernel.cc

* TensorCopySync to phi::Copy

* fix codestyle

* rnn_kernel.cc: add ;

* replace all GetDataFromTensor with phi::GetVectorFromTensor

* delete include of util.h

6c181d1d

revert reshape 0 represent copy and support perm < 0 for paddle.transpose (#50720) · 3669868d
由 zhouweiwei2014 提交于 2月 27, 2023

3669868d

[Bfloat16]register bfloat16 datatype for squared l2 norm (#50908) · 3c121040

由 shaojie_wang 提交于 2月 26, 2023

* register bfloat16 datatype for squared l2 norm

* register bfloat16 datatype for softmax with upper triangular mask

* register bfloat16 for tril triu cuda kernel

3c121040

25 2月, 2023 1 次提交
- Z
  Rename elementwise_heaviside to heaviside (#50821) · 8129c22e
  由 zyfncg 提交于 2月 25, 2023
```
* rename elementwise_heaviside to heaviside

* delete __init__.py

* fix bug
```
  8129c22e
24 2月, 2023 2 次提交
- Y
  
  [Zero-Dim] Support 0D Tensor input for topk/broadcast_to/expand/expand_as/broadcast_shape (#50536) · 5041158f
  由 yunyaoXYY 提交于 2月 24, 2023
  
  5041158f
- Y
  
  supplement header file's code (#50826) · 92cae577
  由 YuanRisheng 提交于 2月 24, 2023
  
  92cae577
23 2月, 2023 2 次提交

[phi decoupling] move generator implementation from fluid to phi (#50746) · 4e417409

由 Huang Jiyi 提交于 2月 23, 2023

* move fluid generator to phi

* move fluid generator to phi

* update .gitignore

* fix bugs

* fix cannot find "glog/logging.h" in "generator.h"

* fix bugs

4e417409

[OptionalOptimization]: LayerNorm forward Optimization with Welford (#50362) · 746b774b

由 limingshu 提交于 2月 23, 2023

* first commit

* main codes has been developed

* fix all bugs

* add vectorize input&output

* a test for optimization_of_layer_norm_fwd

* add some changes

* fix memory coalesced access for more optimization.

* fix addition ctest error

* fix according to ci-approval

* remove change on slice

746b774b

22 2月, 2023 1 次提交

Fix some typos. (#50429) · 93b2bf4b

由 Shuangchi He 提交于 2月 22, 2023

* Fix some typos.
Signed-off-by: Yulv-git <yulvchi@qq.com>

* pre-commit
Signed-off-by: Yulv-git <yulvchi@qq.com>

---------
Signed-off-by: Yulv-git <yulvchi@qq.com>

93b2bf4b

21 2月, 2023 1 次提交

[PHI Decoupling]Remove memory header (Part1) (#50419) · 1cfcb71d

由 YuanRisheng 提交于 2月 21, 2023

* decouple_memory

* perfect memory utils

* fix ci bugs

* fix inference bugs

* fix custom test bugs

* fix converage bugs

* modify code according comment

* modify namespace

* deal with compile bugs

1cfcb71d

20 2月, 2023 1 次提交
- R
  
  [PHI decoupling] remove reference to fluid/framework/tensor.h in phi (#50475) · 1c8e15c9
  由 RedContritio 提交于 2月 20, 2023
  
  1c8e15c9
17 2月, 2023 2 次提交

Rename MultiTensorAdam To FusedAdam (#50449) · e6af9bd2

由 yuehuayingxueluo 提交于 2月 17, 2023

* rename multi_tensor_adam to fused_adam

* fix some bugs

* fix CI coverage

* rename test_fused_adam.py

* fix some bug

* add test_fused_adam_op.py

* fix some bugs

* fix fused_adam_op.cc

* fix CI bugs

* fix CI bug

* fix CI bug

e6af9bd2

[phi decoupling] clean TensorCopy usage in phi (#50538) · b5da73c5

由 Huang Jiyi 提交于 2月 17, 2023

* rm framework::tensor_util in phi

* clean TensoCopy

* fix bugs

* fix bugs

* fix bugs

* repalce mutable_data

* revert custom_device_test.cc

b5da73c5

16 2月, 2023 3 次提交

[dy2static-bugfix] fix backward gradient aggregation bugs (#50474) · d4c7774f

由 xiongkun 提交于 2月 16, 2023

* [dy2static-bugfix] fix backward gradient aggregation bugs
1. Yolov3 and Yolov5 all face the same problem.

* remove set_device

* code review fix

d4c7774f

[Phi decouple] move layer_norm_kernel.cu.h to phi (#50506) · 8910bb4a

由 Huang Jiyi 提交于 2月 16, 2023

* move layer_norm_kernel.cu.h to phi

* fix bugs

* fix namespace

* fix bugs

* fix CI-Windwos

* replace mutable_data

* fix bugs

* fix bugs

8910bb4a

[phi decoupling] remove variable.h in phi (#50407) · 905cefd4

由 Huang Jiyi 提交于 2月 16, 2023

* move variable_utils from phi_api_utils to fluid

* fix coment

* update include

* fix bugs

* fix bugs

* fix bugs

* fix bugs

* fix bugs

* update

* update

* fix CI-Windows-OpenBLAS

* fix bugs

* fix bugs

* fix bugs

* update include

* move variable_utils to phi_utils

* fix namespace

905cefd4

14 2月, 2023 2 次提交

decouple tensor_utils (#50264) · 057cdb95

由 engineer1109 提交于 2月 14, 2023

fix X

remove TensorCopy

codestyle

add fluid memory header

fix symbol

fix cmake

fix cmake

fix context

fix header

fix place

fix context

fix context

fix context

fix code

fix custom context

fix custom context

fix copy

fix data_transform

fix style

remove changes of custom

fix scalar

057cdb95

S

support int8 for embedding (#50413) · 78eb2d87
由 seemingwang 提交于 2月 14, 2023

78eb2d87

09 2月, 2023 3 次提交

[PHI decoupling] move strided_memcpy.h to phi (#50346) · 17318c1a

由 Huang Jiyi 提交于 2月 09, 2023

* decouple strided_memcpy

* move strided_memcpy

* move strided_memcpy to phi

* fix namespace

* update

* fix gpu compile bugs

17318c1a

H

remove layout_utils in phi (#50355) · 90650534
由 Huang Jiyi 提交于 2月 09, 2023

90650534

Add MultiTenosrAdam OP (#49220) · 10654c77

由 yuehuayingxueluo 提交于 2月 09, 2023

* add multi_tenosr_adam

* update multi_tensor_base.py, test_multi_tensor_adam.py, adamw.py

* fix adam.py optimizer.py

* fix adamw.py

* fix test_multi_tensor_adam.py

* fix CI bug

* fix CI coverage

* fix ci bug

* fix betapow

* fix some bugs

* fix test_adamw_op.py

* fix CI coverage

* fix multi_tensor_adam_kernel.cc

* fix CI bug

* fix multi_tensor_adam_op.cc and test_multi_tensor_adam.py

* fix code style

* update C++ parts

* remove python parts modification temporarily

* add C++ ut

* update betapow copy code logic

* fix ci ut

* fix windows ci

* fix coverage ci

* improve coverage rate

---------
Co-authored-by: Nsneaxiy <sneaxiy@126.com>

10654c77

08 2月, 2023 2 次提交
- Z
  Fix bn performance degradation (#50287) · 6f1ec935
  由 zhangkaihuo 提交于 2月 08, 2023
```
* fix bn performance degradation
```
  6f1ec935
- H
  
  move mixed_vector (#50282) · 35d7d1f0
  由 Huang Jiyi 提交于 2月 08, 2023
  
  35d7d1f0
06 2月, 2023 1 次提交
- E
  
  phi move ReshapeToMatrix & GetValue (#50139) · d09962a1
  由 engineer1109 提交于 2月 06, 2023
  
  d09962a1
03 2月, 2023 3 次提交

R
Fix 堆栈溢出 (stack overflow) of case8: paddle.unique_consecutive (#49983) · 83077f6f
由 RedContritio 提交于 2月 03, 2023
```
* support negative index in unique_consecutive

* add unittest

* add unittest
```
83077f6f

Fix 堆栈溢出 (stack overflow) of case3: paddle.metric.accuracy (#49984) · 97411214

由 RedContritio 提交于 2月 03, 2023

* add input check for accuracyOp

* add input check for gpu/accuracyOp

* add unittest

* use rank instead of dimensions in message

* update unittest

* update unittest

97411214

Generate some static graph ops (#49906) · 85490f70

由 HappyHeavyRain 提交于 2月 03, 2023

* generate some static graph ops

* fix the bug of pow

* add REGISTER_ACTIVATION_OP in operators.cmake

* modify the file operators.cmake

85490f70

02 2月, 2023 2 次提交
- C
  Several ops support zero dim on GPU and CPU (#49959) · 5db88d08
  由 Ccc 提交于 2月 02, 2023
```
* paddle.nn.functional.softmax
* paddle.nn.functional.log_softmax
* paddle.nn.functional.gumbel_softmax
* paddle.nn.functional.prelu
```
  5db88d08
- L
  
  Fix the FP16 precision problem of add_n. (#50129) · 14dd68e1
  由 liuruyan 提交于 2月 02, 2023
  
  14dd68e1
01 2月, 2023 2 次提交

[Zero-Dim] Fix 0-dim tensor for arg_min_max op. (#49570) · e4e94a88

由 Zhong Hui 提交于 2月 01, 2023

* fix 0-d tensor for arg_min_max op.

* fix xpu.

* fix zero dims

* fix

* Update arg_min_max_kernel.cc

* Update arg_min_max_kernel.cc

* Update arg_min_max_kernel.cc

* Update test_zero_dim_tensor.py

* Update test_zero_dim_tensor_xpu.py

* Update test_zero_dim_tensor.py

* Update arg_min_max_kernel.cc

* Update arg_min_max_kernel.cc

* Update arg_min_max_kernel.cc

e4e94a88

[Divide by 0 Error] add norm check (#49966) · 5dfddaea

由 gouzil 提交于 2月 01, 2023

* [Divide by 0 Error] add norm check

* [Divide by 0 Error] fix x AttributeError

* [Divide by 0 Error] norm check migrate to c++

5dfddaea

31 1月, 2023 4 次提交
- Z
  
  optimize 2D sync_batch_norm (#49663) · 9a4acfee
  由 zhangkaihuo 提交于 1月 31, 2023
  
  9a4acfee
- 2
  
  support fp16 squaredl2norm (#48315) · ce4637c1
  由 201716010711 提交于 1月 30, 2023
  
  ce4637c1
- R
  
  add dims check for nms_kernel (#49993) · 4976153d
  由 RedContritio 提交于 1月 31, 2023
  
  4976153d
- Y
  Unify the gpu implementation of stack and unstack to reuse the optimization. (#49748) · 3586e856
  由 Yiqun Liu 提交于 1月 31, 2023
```
* Unify the gpu implementation of stack and unstack to reuse the optimization.

* Optimize the cuda implementation of unstack.

* Use GpuMemcpyAsync instead of memory::Copy.

* Fix error of calculating the index.

* Use FastDivMod to further imporve the performance of unstack.
```
  3586e856
30 1月, 2023 2 次提交
- R
  [Divide by 0 Error] add pinv check (#49951) · f6e874bc
  由 Ryan 提交于 1月 30, 2023
```
* add pinv check

* add unitest

* update unitest

* roll back

* fix not call stupid bug

* use context
```
  f6e874bc
- E
  add phi tensor vector array api from fluid (#49885) · 094e3b8c
  由 engineer1109 提交于 1月 30, 2023
```
replace all TensorFromVector & TensorToVector

AssignKernel async copy
```
  094e3b8c

PaddlePaddle / Paddle 大约 1 年 前同步成功

PaddlePaddle / Paddle
大约 1 年前同步成功