提交 · 203a62b803b3b6bae9d5e0478f8f6872d422d95c · PaddlePaddle / Paddle

06 3月, 2023 1 次提交

oneDNN kernels code cleanup (#50743) · e2054925

由 Sławomir Siwek 提交于 3月 06, 2023

* matmul refactored

* fc

* SetOutMemDescWithLogicalLayoutFusesSupport

* matmul_v2

* alpha support

* group repetetive funcs

* matmul utils

* execute matmul methods

* restore registered kernel names

* split header and impl files

* remove double negatives

* increase coverage

* add onednn tests to ctest

* remove fusion logic from base matmuls

e2054925

03 3月, 2023 5 次提交
- G
  【Hackathon No.70】[PHI decoupling] move jit kernels from fluid to phi (#50911) · 2d36c9a9
  由 gouzil 提交于 3月 03, 2023
```
* [phi] move jit kernels from fluid to phi

* [phi] fix paddle::phi err

* [phi] fix windows 'posix_memalign': identifier not found

* [phi] fix windows 'posix_memalign_free': identifier not found

* [phi] fix readme directory structure, fc_functor  paddle::platform
```
  2d36c9a9
- Y
  [PHI Decoupling]Remove memory header (Part2) (#50870) · 558068cc
  由 YuanRisheng 提交于 3月 03, 2023
```
* decouple memory copy

* fix ci bugs

* fix ci compile bugs

* fix rocm compile

* fix ci bugs
```
  558068cc
- Z
  
  Fix batch_norm momentum (#51120) · d9fb639c
  由 zhangkaihuo 提交于 3月 03, 2023
  
  d9fb639c
- W
  add gather_nd_comp_grad composite rule (#50966) · 625e30b7
  由 wangxiaoning 提交于 3月 03, 2023
```
* comp gather_nd_grad

* fix

* test no cinn

* fix

* fix cinn
```
  625e30b7
- N
  
  Add multi_precision for adagrad op (#50078) · 4779c2c1
  由 niuliling123 提交于 3月 03, 2023
  
  4779c2c1
02 3月, 2023 9 次提交

New executor static build for fluid kernel (#50670) · bf50784c

由 Ruibiao Chen 提交于 3月 02, 2023

* Check structed kernel for new executor static build

* Update code

* Ready for resnet50

* Move transfer_dtype to phi

* Ready for transformer

* Fix CI errors

* Fix layer_norm InferMeta

* Remove layer_norm infermeta fix

bf50784c

Cache for cublaslt descriptor (#50931) · 819f8939

由 limingshu 提交于 3月 02, 2023

* first commit

* finish base work

* modification for good

* fix for cache setting and gather the algo and desc as one data for cache storage

* fix for cache setting and gather the algo and desc as one data for cache storage

* install pre-commit check

819f8939

C

fix zero bug of case21: paddle.mode (#51091) · 25d3ed65
由 chenxiao120660 提交于 3月 02, 2023

25d3ed65
A

fix divide zero bug for paddle.all (#51088) · 2bcd3935
由 ahahahahahaha 提交于 3月 02, 2023

2bcd3935
W

[XPU] add smallest mode for top_k (#51053) · 0fd6e2a1
由 wangshengxiang 提交于 3月 02, 2023

0fd6e2a1

[GetCurrentCUDAStream] Add C++ API GetCurrentCUDAStream (#51027) · cce2b94d

由 HongyuJia 提交于 3月 02, 2023

* polish codes according #50813

* [getCurrentCUDAStream] Add C++ API getCurrentCUDAStream

* change get->Get

* wrap with macro

* use Get instead of get

cce2b94d

[AMP OP&Test] register fp16 and bf16 kernel for uniform_random (#50993) · 72f34450

由 Leo Chen 提交于 3月 02, 2023

* register fp16 and bf16 kernel for uniform_random

* fix compile

* support selected_rows

* add ut

* revert cpu

* fp16 test skip cpu

72f34450

Add concat grad cinn (#50972) · a4689c90

由 wangzhen38 提交于 3月 02, 2023

* [cinn] concat_grad

* [cinn] concat_grad

* [cinn] concat_grad build success

* [Add PGLBOX] fix unnitest

* [Add PGLBOX] fix unnitest

* [Add PGLBOX] fix codestyle

* [cinn] update by comments

* [cinn] update by comment

* [cinn] add axis check

a4689c90

H

Change xpu_context.h to cut off unrelated dependency (#51079) · b535d6ce
由 haosicheng 提交于 3月 02, 2023

b535d6ce

01 3月, 2023 12 次提交

Integration flash attention (#49869) · 61611786

由 Chitsing KUI 提交于 3月 01, 2023

* flash attn

* seed

* almost

* softmax

* fix workspace

* add unitest; linux only

* fix setup

* fix datatype include

* fix setup typo

* fix def scope

* new error api

* use paddle fork

* fix attr bug; complete ut

* update flash hash

* fix rng reset

* fix offset

* fix comments

61611786

[Tensor Operants & Prim-Relevant] Tensor supports logical operants (#50983) · 1794927b

由 HongyuJia 提交于 3月 01, 2023

* Add comments for #50886

* [Tensor Operants & Prim-Relevant] Tensor supports logical operants

* add prim dynamic unit test

* add prim static unit test

1794927b

add topk prim backward (#50679) · 296b3ff0

由 zqw_1997 提交于 3月 01, 2023

* tmp gather vjp

* support gather

* remove useless code

* fix compiling error

* fix ut

* add eager test

* add eager test

* add seed

* small change

* fix cpu error

* fix transpose op compat

* remove tensor index case

* fix prim_cinn

* small commit

* add cumsum prim backward

* small commit

* skip aixs=None test case

* fix op generante eror

* fix static test error

* remove unused code

* fix static test error

* small commit

* skip cpu float16 test case

* skip eager cpu cumsum float16 test case

* add eager and static UT

* fix ut

* add composite backward rule

* fix error

* fix type error and format error

* add try cpu+float16 test

* fix test bugs

* remove test for cpu+float16 and make y[0] be the grad arg

* add cinn test

* fix UT

* fix the wrong dim of v in test cases

* change y[0] to y[1] for grad in UT

* reshape flatten out

* Disable cinn single test

* use scatter_nd_add

* modify the reshape part of topk_grad

* delete useless build file

* to make the syntax right

* modify bug

* try use of put_along_axis

* remove cinn test

* reformat todo

* add silu composite rule

* fix code style.

* add cinn test

* fix composite grad maker code gen

* add prim in cumsum op test

* remove old test

* fix typro

* pass the static test

* fix typro

* modify optest and delete old test files

* remove normal test_top_k_op test

* fix typro

* pass axis=None test case

* buffer comment

* for debug

* add silu fp16 unit test.

* add static guard

* remove forward prim test

* remove same name axis

* modify the test_top_v2_op.py to pass all local tests

* delete the useless testcase

* fix mistake

* add more testcases to test dtype16 and dtype32

---------
Co-authored-by: NJiabinYang <360788950@qq.com>
Co-authored-by: NGGBond8488 <857631483@qq.com>
Co-authored-by: Nzxcd <228587199@qq.com>
Co-authored-by: NCharles-hit <wanghao107@baidu.com>

296b3ff0

[Zero-Dim] Add Expand/Expand_as/Top_k for XPU to support Zero Dim Input. (#50947) · 226b4a95

由 yunyaoXYY 提交于 3月 01, 2023

* Add unitest from shilong

* Add kernel code from shilong

* fix codestyle

* add broadcast_shape test

* fix unitest

* fix unitests

* fix unitest

* add 0D grad support

* add 0D grad support

* add 0D grad support

* fix 0D tensor

* fix 0D

* fix xpu 0D

* fix expand kernel

* fix xpu expand

* Fix 0D kernel

* fix 0D

* fix 0D

* fix 0D

* fix 0D

* fix XPU top_k

* cancel the modify of xpu

* add XPU 0D tensor

* fix 0D

226b4a95

W

fix the backward bug of cumsum (#50997) · 934934d8
由 wawltor 提交于 3月 01, 2023

934934d8
M

[xpu] fix bugs of split/embedding_with_wltwise_add/beam_search_decode kernel (#51052) · 753fa844
由 mayang002 提交于 3月 01, 2023

753fa844
C
fix zero bug of case18: paddle.logsumexp (#51034) · 2f900965
由 chenxiao120660 提交于 3月 01, 2023
```
* fix bug of logsumexp

* fix bug for logsumexp

* fix bug for logsumexp
```
2f900965
C

add op map (#51026) · 83f61bd5
由 cyber-pioneer 提交于 3月 01, 2023

83f61bd5
N

Add multiprecision for rms op (#50132) · 48060b2e
由 niuliling123 提交于 3月 01, 2023

48060b2e

[XPU] Add kernels for VITDET (#50992) · 798b527c

由 duanyanhui 提交于 3月 01, 2023

* add support of int64 add for xpu

* add transpose support for int64

* add randperm kernel

* fix randperm

* add distribute_fpn_proposal kernel

* fix comment

* add reduce_sum_int32

798b527c

E

fix custom plugin include headers error (#51013) · a548e70c
由 engineer1109 提交于 3月 01, 2023

a548e70c
R

fix gcc12 error (#51037) · ed511175
由 risemeup1 提交于 3月 01, 2023

ed511175

28 2月, 2023 9 次提交

G
【Hackathon No.69】[PHI decoupling] move device_wrapper from fluid to phi (#50749) · 7b6d5ac0
由 gouzil 提交于 2月 28, 2023
```
* [phi] move device_wrapper from fluid to phi

* [phi] fix ‘PADDLE_ENFORCE_XDNN_SUCCESS’ was not declared in this scope
```
7b6d5ac0
H

[Tensor Operants & Prim-Relevant] Tensor API support default value (#50928) · 2e6e188a
由 HongyuJia 提交于 2月 28, 2023

2e6e188a

【prim】Matmul double grad composite api (#50452) · a0c473f4

由 xiaoguoguo626807 提交于 2月 28, 2023

* modify name

* merge develop

* original code

* build modify

* success 2*2

* fused dim=1 failed

* success

* modify static

* success for static except dim=1

* delete log

* tmp modify

* success

* success

* add fp1664

* delete fp16 cpu test

* stop windows test

* review modify

* modify tanh test

* modify tanh

* fix_conflixt

* modift static prim

* fix_conflict

* Update test_static_prim.cc

* update

* bug fix

a0c473f4

H
[C++ API GetAllocator] Add C++ `GetAllocator` interface (#50813) · 74446b37
由 HongyuJia 提交于 2月 28, 2023
```
* [C++ API GetAllocator] Add C++ GetAllocator interface

* move api to accurate directory
```
74446b37

add cumsum prim backward (#50565) · ca2b6095

由 GGBond8488 提交于 2月 28, 2023

* add cumsum prim backward

* skip aixs=None test case

* fix op generante eror

* fix static test error

* remove unused code

* fix static test error

* skip cpu float16 test case

* skip eager cpu cumsum float16 test case

* add cinn test

* reshape flatten out

* Disable cinn single test

* remove cinn test

* reformat todo

* add prim in cumsum op test

* remove old test

* fix typro

* fix typro

* fix typro

* pass axis=None test case

* remove forward prim test

* remove same name axis

ca2b6095

Z

[XPU] support convert fp16 model (#50790) · f265a313
由 zhupengyang 提交于 2月 28, 2023

f265a313
S

xpu gaussian_random support fp16 (#50881) · 569b018e
由 shentanyue 提交于 2月 28, 2023

569b018e
T

xpu-paddlepaddle-57 [任务] adamw lr_radio支持 (#50979) · dda74715
由 taixiurong 提交于 2月 28, 2023

dda74715

【Prim】Reshape, transpose, cast vjp (#50778) · ab1b6303

由 Jiabin Yang 提交于 2月 28, 2023

* support transpose and reshape

* support reshpe, transpose, cast vjp

* merge develop

* recover unused file

* remove prim base

* support problem

* remove additional status settting

* remove additional status settting

* fix ut

* fix ut

* fix ut

* fix no grad branch

* add more test

* disable fp16 in cpu

* fix test

ab1b6303

27 2月, 2023 4 次提交

[XPU] add fp16 support for shape and lookup_table_v2 op. (#50773) · d2a0577a

由 houj04 提交于 2月 27, 2023

* [XPU] add fp16 support for shape op.

* [XPU] add fp16 support for lookup_table_v2 op.

* update approval list: add qingshu's id.

d2a0577a

张

【Hackathon No.68】Remove utils in phi (#50833) · 6c181d1d

由张春乔提交于 2月 27, 2023

* remove utils

* remove utils

* remove utils

* remove utils

* Update get_data_from_tensor.h

* Update rnn_functor.h

* Update rnn_grad_kernel.cu.cc

* Update rnn_kernel.cu.cc

* Update rnn_kernel.cc

* Update rnn_grad_kernel.cu.cc

* Update rnn_functor.h

* Update rnn_kernel.cu.cc

* Update rnn_kernel.cc

* remove utils

* Update rnn_functor.h

* remove utils

* remove utils

* remove utils

* remove utils

* remove utils

* Update rnn_functor.h

* Update unsqueeze_op.h

* Update utils.h

* roll back

* Update tensor_utils.h

* Update tensor_utils.h

* Update tensor_utils.h

* Update tensor_utils.h

* Update tensor_utils.h

* use TensorToVector

* use TensorToVector

* use TensorToVector

* use TensorToVector

* use TensorToVector

* Update rnn_kernel.cc

* Update rnn_grad_kernel.cc

* Update rnn_functor.h

* Update rnn_grad_kernel.cu.cc

* Update rnn_kernel.cu.cc

* Update rnn_functor.h

* Update rnn_grad_kernel.cu.cc

* Update rnn_kernel.cu.cc

* Update rnn_functor.h

* Update rnn_grad_kernel.cu.cc

* Update rnn_kernel.cu.cc

* add TensorToVector

* roll back

* Update tensor_utils.h

* Update rnn_functor.h

* Update rnn_grad_kernel.cu.cc

* Update tensor_utils.h

* Update rnn_kernel.cu.cc

* Update rnn_grad_kernel.cc

* Update rnn_kernel.cc

* Update rnn_grad_kernel.cu.cc

* Update rnn_kernel.cu.cc

* Update rnn_grad_kernel.cc

* Update rnn_kernel.cc

* TensorCopySync to phi::Copy

* fix codestyle

* rnn_kernel.cc: add ;

* replace all GetDataFromTensor with phi::GetVectorFromTensor

* delete include of util.h

6c181d1d

H
[Tensor Operants & Prim] Tensor pow API uses elementwise_pow (#50886) · 8a097399
由 HongyuJia 提交于 2月 27, 2023
```
* [Tensor Operants & Prim] Tensor pow API uses elementwise_pow

* unittest change to fill_constant+elementwise_pow
```
8a097399
B
Reduce redundant cpu computation in slice compute (#50348) · 8aec0580
由 Bo Zhang 提交于 2月 27, 2023
```
* conflict

* add UpdateSliceAttrs
```
8aec0580

PaddlePaddle / Paddle 1 年多 前同步成功

PaddlePaddle / Paddle
1 年多前同步成功