提交 · cce2b94df86b9df3a9bfed46ad07fdcd5fa8c894 · PaddlePaddle / Paddle

02 3月, 2023 4 次提交

[GetCurrentCUDAStream] Add C++ API GetCurrentCUDAStream (#51027) · cce2b94d

由 HongyuJia 提交于 3月 02, 2023

* polish codes according #50813

* [getCurrentCUDAStream] Add C++ API getCurrentCUDAStream

* change get->Get

* wrap with macro

* use Get instead of get

cce2b94d

[AMP OP&Test] register fp16 and bf16 kernel for uniform_random (#50993) · 72f34450

由 Leo Chen 提交于 3月 02, 2023

* register fp16 and bf16 kernel for uniform_random

* fix compile

* support selected_rows

* add ut

* revert cpu

* fp16 test skip cpu

72f34450

Add concat grad cinn (#50972) · a4689c90

由 wangzhen38 提交于 3月 02, 2023

* [cinn] concat_grad

* [cinn] concat_grad

* [cinn] concat_grad build success

* [Add PGLBOX] fix unnitest

* [Add PGLBOX] fix unnitest

* [Add PGLBOX] fix codestyle

* [cinn] update by comments

* [cinn] update by comment

* [cinn] add axis check

a4689c90

H

Change xpu_context.h to cut off unrelated dependency (#51079) · b535d6ce
由 haosicheng 提交于 3月 02, 2023

b535d6ce

01 3月, 2023 12 次提交

Integration flash attention (#49869) · 61611786

由 Chitsing KUI 提交于 3月 01, 2023

* flash attn

* seed

* almost

* softmax

* fix workspace

* add unitest; linux only

* fix setup

* fix datatype include

* fix setup typo

* fix def scope

* new error api

* use paddle fork

* fix attr bug; complete ut

* update flash hash

* fix rng reset

* fix offset

* fix comments

61611786

[Tensor Operants & Prim-Relevant] Tensor supports logical operants (#50983) · 1794927b

由 HongyuJia 提交于 3月 01, 2023

* Add comments for #50886

* [Tensor Operants & Prim-Relevant] Tensor supports logical operants

* add prim dynamic unit test

* add prim static unit test

1794927b

add topk prim backward (#50679) · 296b3ff0

由 zqw_1997 提交于 3月 01, 2023

* tmp gather vjp

* support gather

* remove useless code

* fix compiling error

* fix ut

* add eager test

* add eager test

* add seed

* small change

* fix cpu error

* fix transpose op compat

* remove tensor index case

* fix prim_cinn

* small commit

* add cumsum prim backward

* small commit

* skip aixs=None test case

* fix op generante eror

* fix static test error

* remove unused code

* fix static test error

* small commit

* skip cpu float16 test case

* skip eager cpu cumsum float16 test case

* add eager and static UT

* fix ut

* add composite backward rule

* fix error

* fix type error and format error

* add try cpu+float16 test

* fix test bugs

* remove test for cpu+float16 and make y[0] be the grad arg

* add cinn test

* fix UT

* fix the wrong dim of v in test cases

* change y[0] to y[1] for grad in UT

* reshape flatten out

* Disable cinn single test

* use scatter_nd_add

* modify the reshape part of topk_grad

* delete useless build file

* to make the syntax right

* modify bug

* try use of put_along_axis

* remove cinn test

* reformat todo

* add silu composite rule

* fix code style.

* add cinn test

* fix composite grad maker code gen

* add prim in cumsum op test

* remove old test

* fix typro

* pass the static test

* fix typro

* modify optest and delete old test files

* remove normal test_top_k_op test

* fix typro

* pass axis=None test case

* buffer comment

* for debug

* add silu fp16 unit test.

* add static guard

* remove forward prim test

* remove same name axis

* modify the test_top_v2_op.py to pass all local tests

* delete the useless testcase

* fix mistake

* add more testcases to test dtype16 and dtype32

---------
Co-authored-by: NJiabinYang <360788950@qq.com>
Co-authored-by: NGGBond8488 <857631483@qq.com>
Co-authored-by: Nzxcd <228587199@qq.com>
Co-authored-by: NCharles-hit <wanghao107@baidu.com>

296b3ff0

[Zero-Dim] Add Expand/Expand_as/Top_k for XPU to support Zero Dim Input. (#50947) · 226b4a95

由 yunyaoXYY 提交于 3月 01, 2023

* Add unitest from shilong

* Add kernel code from shilong

* fix codestyle

* add broadcast_shape test

* fix unitest

* fix unitests

* fix unitest

* add 0D grad support

* add 0D grad support

* add 0D grad support

* fix 0D tensor

* fix 0D

* fix xpu 0D

* fix expand kernel

* fix xpu expand

* Fix 0D kernel

* fix 0D

* fix 0D

* fix 0D

* fix 0D

* fix XPU top_k

* cancel the modify of xpu

* add XPU 0D tensor

* fix 0D

226b4a95

W

fix the backward bug of cumsum (#50997) · 934934d8
由 wawltor 提交于 3月 01, 2023

934934d8
M

[xpu] fix bugs of split/embedding_with_wltwise_add/beam_search_decode kernel (#51052) · 753fa844
由 mayang002 提交于 3月 01, 2023

753fa844
C
fix zero bug of case18: paddle.logsumexp (#51034) · 2f900965
由 chenxiao120660 提交于 3月 01, 2023
```
* fix bug of logsumexp

* fix bug for logsumexp

* fix bug for logsumexp
```
2f900965
C

add op map (#51026) · 83f61bd5
由 cyber-pioneer 提交于 3月 01, 2023

83f61bd5
N

Add multiprecision for rms op (#50132) · 48060b2e
由 niuliling123 提交于 3月 01, 2023

48060b2e

[XPU] Add kernels for VITDET (#50992) · 798b527c

由 duanyanhui 提交于 3月 01, 2023

* add support of int64 add for xpu

* add transpose support for int64

* add randperm kernel

* fix randperm

* add distribute_fpn_proposal kernel

* fix comment

* add reduce_sum_int32

798b527c

E

fix custom plugin include headers error (#51013) · a548e70c
由 engineer1109 提交于 3月 01, 2023

a548e70c
R

fix gcc12 error (#51037) · ed511175
由 risemeup1 提交于 3月 01, 2023

ed511175

28 2月, 2023 9 次提交

G
【Hackathon No.69】[PHI decoupling] move device_wrapper from fluid to phi (#50749) · 7b6d5ac0
由 gouzil 提交于 2月 28, 2023
```
* [phi] move device_wrapper from fluid to phi

* [phi] fix ‘PADDLE_ENFORCE_XDNN_SUCCESS’ was not declared in this scope
```
7b6d5ac0
H

[Tensor Operants & Prim-Relevant] Tensor API support default value (#50928) · 2e6e188a
由 HongyuJia 提交于 2月 28, 2023

2e6e188a

【prim】Matmul double grad composite api (#50452) · a0c473f4

由 xiaoguoguo626807 提交于 2月 28, 2023

* modify name

* merge develop

* original code

* build modify

* success 2*2

* fused dim=1 failed

* success

* modify static

* success for static except dim=1

* delete log

* tmp modify

* success

* success

* add fp1664

* delete fp16 cpu test

* stop windows test

* review modify

* modify tanh test

* modify tanh

* fix_conflixt

* modift static prim

* fix_conflict

* Update test_static_prim.cc

* update

* bug fix

a0c473f4

H
[C++ API GetAllocator] Add C++ `GetAllocator` interface (#50813) · 74446b37
由 HongyuJia 提交于 2月 28, 2023
```
* [C++ API GetAllocator] Add C++ GetAllocator interface

* move api to accurate directory
```
74446b37

add cumsum prim backward (#50565) · ca2b6095

由 GGBond8488 提交于 2月 28, 2023

* add cumsum prim backward

* skip aixs=None test case

* fix op generante eror

* fix static test error

* remove unused code

* fix static test error

* skip cpu float16 test case

* skip eager cpu cumsum float16 test case

* add cinn test

* reshape flatten out

* Disable cinn single test

* remove cinn test

* reformat todo

* add prim in cumsum op test

* remove old test

* fix typro

* fix typro

* fix typro

* pass axis=None test case

* remove forward prim test

* remove same name axis

ca2b6095

Z

[XPU] support convert fp16 model (#50790) · f265a313
由 zhupengyang 提交于 2月 28, 2023

f265a313
S

xpu gaussian_random support fp16 (#50881) · 569b018e
由 shentanyue 提交于 2月 28, 2023

569b018e
T

xpu-paddlepaddle-57 [任务] adamw lr_radio支持 (#50979) · dda74715
由 taixiurong 提交于 2月 28, 2023

dda74715

【Prim】Reshape, transpose, cast vjp (#50778) · ab1b6303

由 Jiabin Yang 提交于 2月 28, 2023

* support transpose and reshape

* support reshpe, transpose, cast vjp

* merge develop

* recover unused file

* remove prim base

* support problem

* remove additional status settting

* remove additional status settting

* fix ut

* fix ut

* fix ut

* fix no grad branch

* add more test

* disable fp16 in cpu

* fix test

ab1b6303

27 2月, 2023 8 次提交

[XPU] add fp16 support for shape and lookup_table_v2 op. (#50773) · d2a0577a

由 houj04 提交于 2月 27, 2023

* [XPU] add fp16 support for shape op.

* [XPU] add fp16 support for lookup_table_v2 op.

* update approval list: add qingshu's id.

d2a0577a

张

【Hackathon No.68】Remove utils in phi (#50833) · 6c181d1d

由张春乔提交于 2月 27, 2023

* remove utils

* remove utils

* remove utils

* remove utils

* Update get_data_from_tensor.h

* Update rnn_functor.h

* Update rnn_grad_kernel.cu.cc

* Update rnn_kernel.cu.cc

* Update rnn_kernel.cc

* Update rnn_grad_kernel.cu.cc

* Update rnn_functor.h

* Update rnn_kernel.cu.cc

* Update rnn_kernel.cc

* remove utils

* Update rnn_functor.h

* remove utils

* remove utils

* remove utils

* remove utils

* remove utils

* Update rnn_functor.h

* Update unsqueeze_op.h

* Update utils.h

* roll back

* Update tensor_utils.h

* Update tensor_utils.h

* Update tensor_utils.h

* Update tensor_utils.h

* Update tensor_utils.h

* use TensorToVector

* use TensorToVector

* use TensorToVector

* use TensorToVector

* use TensorToVector

* Update rnn_kernel.cc

* Update rnn_grad_kernel.cc

* Update rnn_functor.h

* Update rnn_grad_kernel.cu.cc

* Update rnn_kernel.cu.cc

* Update rnn_functor.h

* Update rnn_grad_kernel.cu.cc

* Update rnn_kernel.cu.cc

* Update rnn_functor.h

* Update rnn_grad_kernel.cu.cc

* Update rnn_kernel.cu.cc

* add TensorToVector

* roll back

* Update tensor_utils.h

* Update rnn_functor.h

* Update rnn_grad_kernel.cu.cc

* Update tensor_utils.h

* Update rnn_kernel.cu.cc

* Update rnn_grad_kernel.cc

* Update rnn_kernel.cc

* Update rnn_grad_kernel.cu.cc

* Update rnn_kernel.cu.cc

* Update rnn_grad_kernel.cc

* Update rnn_kernel.cc

* TensorCopySync to phi::Copy

* fix codestyle

* rnn_kernel.cc: add ;

* replace all GetDataFromTensor with phi::GetVectorFromTensor

* delete include of util.h

6c181d1d

H
[Tensor Operants & Prim] Tensor pow API uses elementwise_pow (#50886) · 8a097399
由 HongyuJia 提交于 2月 27, 2023
```
* [Tensor Operants & Prim] Tensor pow API uses elementwise_pow

* unittest change to fill_constant+elementwise_pow
```
8a097399
B
Reduce redundant cpu computation in slice compute (#50348) · 8aec0580
由 Bo Zhang 提交于 2月 27, 2023
```
* conflict

* add UpdateSliceAttrs
```
8aec0580
Y

Add PADDLE_THROW in ToCudaDataType and polish codes. (#50922) · 2eeaaa7d
由 Yiqun Liu 提交于 2月 27, 2023

2eeaaa7d
revert reshape 0 represent copy and support perm < 0 for paddle.transpose (#50720) · 3669868d
由 zhouweiwei2014 提交于 2月 27, 2023

3669868d
W
xpu: bind op scatter_nd_add. add data type for transpose2, clip & assign_value (#50825) · 0d12afea
由 wangshengxiang 提交于 2月 27, 2023
```
* [XPU] bind op scatter_nd_add

* [XPU] add more data type for op: clip, transpose2 & assign_value
```
0d12afea

[Bfloat16]register bfloat16 datatype for squared l2 norm (#50908) · 3c121040

由 shaojie_wang 提交于 2月 26, 2023

* register bfloat16 datatype for squared l2 norm

* register bfloat16 datatype for softmax with upper triangular mask

* register bfloat16 for tril triu cuda kernel

3c121040

26 2月, 2023 2 次提交

Matmul performance optimization with cuBlasLt (#46431) · d4217fc6

由 limingshu 提交于 2月 26, 2023


* implement of matmul using cublasLt instead of cublas

* Update matmul_kernel_impl_via_blasLt.h

---------
Co-authored-by: Nzhangbopd <1299246947@qq.com>
Co-authored-by: NBo Zhang <105368690+zhangbopd@users.noreply.github.com>
Co-authored-by: NLiu Yiqun <liuyiqun01@baidu.com>

d4217fc6

Enable matmul + bias fusion in fused_gat_attention. (#50755) · 57f6a469

由 Yiqun Liu 提交于 2月 26, 2023

* Enable matmul + bias fusion in fused_gat_attention.

* Add a variable to control whether using fused matmul + bias.

57f6a469

25 2月, 2023 1 次提交
- Z
  Rename elementwise_heaviside to heaviside (#50821) · 8129c22e
  由 zyfncg 提交于 2月 25, 2023
```
* rename elementwise_heaviside to heaviside

* delete __init__.py

* fix bug
```
  8129c22e
24 2月, 2023 4 次提交
- Y
  
  [Zero-Dim] Support 0D Tensor input for topk/broadcast_to/expand/expand_as/broadcast_shape (#50536) · 5041158f
  由 yunyaoXYY 提交于 2月 24, 2023
  
  5041158f
- N
  
  Fix KP operator Kernel selection error (#50178) · 6ef3f2ce
  由 niuliling123 提交于 2月 24, 2023
  
  6ef3f2ce
- Y
  
  Fix libpaddle_inference.so symbol conflicts with other .so (gflags) (#50787) · 041ea14c
  由 Yuanle Liu 提交于 2月 24, 2023
  
  041ea14c
- support 'backend' in static ops (#50671) · 363825df
  由 HappyHeavyRain 提交于 2月 24, 2023
```
* support 'backend' in static ops

* change bitwise_xx comment in python

* change bitwise_xxx comment in python

* change 'backend' and 'data_type' in GetExpectedKernelType
```
  363825df

PaddlePaddle / Paddle 1 年多 前同步成功

PaddlePaddle / Paddle
1 年多前同步成功