提交 · 553630aafcd956b2dd60ea92520244b3e89c9684 · PaddlePaddle / Paddle

27 3月, 2023 6 次提交

L
unbind support bool dtype (#52080) · 553630aa
由 Leo Chen 提交于 3月 27, 2023
```
* unbind support bool dtype

* replace np.array_equal
```
553630aa
L
Add data type of int, int64 for add kernel. Modify the code style of (#50443) · 62bff0e0
由 Leo Guo 提交于 3月 27, 2023
```
instance_norm_grad kernel. Fix bugs that the data type of input is different from output in reduce_sum kernel. test=kunlun
```
62bff0e0
R
fix_gcc12_error (#52083) · f7267412
由 risemeup1 提交于 3月 27, 2023
```
* fix_gcc12_error

* fix gcc12 error

* fix gcc12 error
```
f7267412

Fused elementwise_(mul/div) (#50428) · 968f7f24

由 Sławomir Siwek 提交于 3月 27, 2023

* extract Op and OPMaker to .h

* extend pattern for fused_op

* set "with_residual" default to false

* adjust fuse passes

* remove fc+eltwise flag

* fused_output_scale

* activation attrs

* remove extra attrs

* fix int8/bf16 unit tests

* simplify RecomputeOutputDims

* remove unused method

* Add description for attributes

* add extra check

* adjust op compats

* update quantize test

* fix protobuf parsing error

* fix int8 performance

* fused elementwises

* merge develop

* remove activation

* restore activation for existing add/sub ops

968f7f24

H

[XPU] layer_norm support fp16 input of scale and bias. (#52091) · 14abafa1
由 houj04 提交于 3月 27, 2023

14abafa1

Fix memory efficient attention bug (#52117) · 019e1cf5

由 sneaxiy 提交于 3月 27, 2023

* fix mea compile error

* support 2-D bias

* add inline to avoid compile error

* polish codes

019e1cf5

25 3月, 2023 1 次提交
- R
  [Fix Bug] fix get_new_shape and get_new_data_from_tensor not support fallback... · db5204ec
  由 Ruibin Cheung 提交于 3月 25, 2023
```
[Fix Bug] fix get_new_shape and get_new_data_from_tensor not support fallback to CPU on custom device (#52002)
```
  db5204ec
24 3月, 2023 7 次提交

add phi operator allreduce/reduce (#51857) · 47f87ad3

由 TaoTao Li 提交于 3月 24, 2023

* add all_reduce, reduce kernel and api

* fix all_reduce reduce ut

fix reduce op maker conflict

fix merge conflicts

* fix conflicts, rename ReduceOp->ReduceBaseOp in reduce_ops

rename allreduce op, to remove

* fix code format

fix comments

* modify test_collective_reduce_api ut timeout

* fix PR-CI-Build

fix comments: format phi operator

47f87ad3

[PHI Decoupling]Remove memory header (Part3) (#51288) · 3d78e759

由 YuanRisheng 提交于 3月 24, 2023

* decouple memory copy

* fix ci bugs

* fix ci compile bugs

* fix rocm compile

* fix ci bugs

* decouple memory

* deal with conflict

* fix xpu compile bugs

* fix xpu bugs

* deal with xpu bugs

* fix cmake bugs

* fix windows bugs

* fix ci bugs

* fix ci bugs

* delete redundance code

* add code for pybind

* fix py3 bugs

* fix ci bugs

3d78e759

P
[PHI]fix momentum dtype infer (#51353) · 648ec795
由 PuQing 提交于 3月 24, 2023
```
* fix momentum dtype infer

* fix momentum datatype

* fix on cpu

* add momentum
```
648ec795
T
【PaddlePaddle Hackathon 4 No.40】为 Paddle 优化 kthvalue op 在 GPU 上的计算性能 (#51835) · e18f5339
由 thunder95 提交于 3月 24, 2023
```
* untracked files

* kthvalue perf

* remove unused files

* fix isnan

* fix isnan2

* fix bug

* try to fix rocm error
```
e18f5339

Memory Efficient Attention (#51867) · e5ad3859

由 ZhangDY-6483 提交于 3月 24, 2023

* first version, notest

* return final rst, notest

* use infinity() instead of max

* ut structure

* start up of ut

* generate lse

* update

* add depense

* reconstruct cmake

* move file

* add memory efficient attention and fix blasimpl

* update

* update cmake

* add namespace

* update cmake

* use .cu

* update for pad3d

* bug fix

* bug fix

* update

* bug fix

* update enforce

* add test case

* merge the lse pad

* fix kernel_fn of backward

* fix PADDLE_ENFORCE_EQ and phi_api

* fix PADDLE_ENFORCE

* fix PADDLE_ENFORCE

* rerun coverage

* fix memory efficient attention test

* rerun ci

* add cuda version condition

* add cuda version condition

* delete WIP test

* replace PADDLE_ENFORCE

* edit the namespace of datatype in multiple.cc

* rerun

* rerun

---------
Co-authored-by: Nliuyuang <liuyuang@baidu.com>

e5ad3859

Z

remove copy of index for gather_nd_grad and scatter_nd_add op in xpu (#51871) · b110085f
由 zhangyikun02 提交于 3月 24, 2023

b110085f
Y

Fix roll kernel gpu bug. (#52012) · b6d0dac9
由 Yuang Liu 提交于 3月 24, 2023

b6d0dac9

23 3月, 2023 9 次提交
- Z
  
  pool2d and pool2d_grad support case of kernel_size > kh/kw for xpu (#51870) · 5f388221
  由 zhangyikun02 提交于 3月 23, 2023
  
  5f388221
- S
  Remove fluid deps in fused_linear_param_grad_add_kernel.cu (#51975) · 5da1a27b
  由 sneaxiy 提交于 3月 23, 2023
```
* remove fluid deps in fused_linear_param_grad_add_kernel

* fix compile error

* fix ut error

* follow comments
```
  5da1a27b
- L
  Optimization for DropoutNd on Host side (#51934) · 101c9bb0
  由 limingshu 提交于 3月 23, 2023
```
* first commit

* fix bugs

* remove_useless sync
```
  101c9bb0
- L
  [AMP] Add bfloat16 Support for `elementwise_pow` Op (#51888) · 288ad844
  由 Lin Manhui 提交于 3月 23, 2023
```
* Add bf16 support for elementwise_pow

* Update ut
```
  288ad844
- Y
  
  gather and gather nd fp16, bf16 support and add ut (#51903) · 5bcdfbb0
  由 Yuang Liu 提交于 3月 23, 2023
  
  5bcdfbb0
- Y
  [AMP] Add bfloat16 and float16 tests for compare ops (#51978) · a7397e0c
  由 yeliang2258 提交于 3月 23, 2023
```
* add bf16 and fp16 tests

* fix dtype check
```
  a7397e0c
- L
  【PaddlePaddle Hackathon 4】No.63 fix temporal_shift and conj (#51532) · 1550348e
  由 LoneRanger 提交于 3月 23, 2023
```
* add fp16 and bfp16 for temporalshift

* add fp16 and bfp16 for complex

* fix bug

* fix bug

* add fp16 and bf16 for conj

* fix bug

* fix bug

* Update complex_kernel.h

fix bug

* Update temporal_shift_grad_kernel.h

fix bug

* Update temporal_shift_kernel.h

fix bug
```
  1550348e
- P
  [PHI] Add nanmedian output defs (#51358) · a82911a5
  由 PuQing 提交于 3月 23, 2023
```
* add nanmedian output defs

* remove the multiclass_nms3 momentum
```
  a82911a5
- D
  【Hackathon No.45】为 Paddle logical 算子实现 float16 数据类型支持 (#50926) · 0480ff5d
  由 denglianbin 提交于 3月 23, 2023
```
* finish pr

* skip cpu test for logical

* change test style

* fix error.
```
  0480ff5d
22 3月, 2023 12 次提交

[Zero-Dim] Support 0-D tensor for some oneDNN unary kernels (#51687) · 2a3d75bc

由 YangQun 提交于 3月 22, 2023

* support 0-d tensor for element wise unary ops

* fix python code style check

* fix approval check

* support 0-d tensor for onednn softmax and logsoftmax kernels

* fix commnets

* fix some unittests

2a3d75bc

S

add fused dropout add (#51752) · 6ba0507d
由 ShenLiang 提交于 3月 22, 2023

6ba0507d
D
[XPU] fix distribute_fpn_proposals (#51873) · a10718e8
由 duanyanhui 提交于 3月 22, 2023
```
* fix distribute_fpn_proposals

* fix bug
```
a10718e8

Extract fused_transpose op dedicated for oneDNN fuse passes (#50021) · 02296977

由 Sławomir Siwek 提交于 3月 22, 2023

* extract common methods to reuse

* add header for transpose ops

* fused_transpose

* Split big function

* transpose2 tests

* fused_transpose

* Apply extra attributes

* add pbtxt file

* update pbtxt

* Merge develop

* add more strict op compats

* code  style

* remove mkldnn_data_type

* unify SetOutMemDescWithReshape2FuseSupport

* adjust quantize-dequantize for transpose

* remove appendact

* transpose2 quantization

* fix int8 tests

* adjust transpose_op to current develop

* delete fusion code from transpose_kernel

* add fused transpose to NHWC unittest

* change order

02296977

P
[PHI] Add multiclass_nms3 output defs (#51355) · 06cb6553
由 PuQing 提交于 3月 22, 2023
```
* add nms3 register output defs

* remove nms from set

* remove nms from set
```
06cb6553
B
【AMP OP&Test】unit test for test_logit_op (#51051) · 289677e2
由 Bo Zhang 提交于 3月 22, 2023
```
* test_logit_op

* add cudaKernel to replace eigen impl

* bf16 unit test CI
```
289677e2
N

Fix type error in adagrad_kernel (#51790) · 8ef020c1
由 niuliling123 提交于 3月 22, 2023

8ef020c1
Z
Revert "[AMP OP&Test] Support float & bfloat16 when using thrust (#51627)" (#51897) · 57e368b8
由 Zhang Zheng 提交于 3月 22, 2023
```
This reverts commit 3b2cd23a.
```
57e368b8

Add fused_linear_param_grad_add_kernel (#51805) · f59c5d8b

由 sneaxiy 提交于 3月 22, 2023

* add fused_linear_param_grad_add_kernel

* fix compile error

* remove flag

* fix ci compile error

* fix ci compile error

* revert pylayer revision

* fix ci ut

* improve performance

f59c5d8b

【AMP OP&Test】unit test for accuracy_op (#51009) · 8c61a95a

由 Bo Zhang 提交于 3月 22, 2023

* test_accuracy_op

* add create_test_fp/bf16_class

* cast after calculation

* change convert_uint16_to_float_ifneed

* delete TestAccuracyOpFp32 according to PR comment

* fix the rtol setting rules in bfloat16 forward

8c61a95a

D

Case7:paddle.distribution.Beta：fix beta(true stack) (#51847) · 32baca93
由 Difer 提交于 3月 22, 2023

32baca93
Y

【AMP OP&Test】Support bf16 scatter and scatter_nd_add, add bf16/fp16 ut. (#51689) · f06dd08d
由 Yuang Liu 提交于 3月 22, 2023

f06dd08d

21 3月, 2023 5 次提交

[PHI decoupling] Move DataType* from paddle:experimental to phi namespace (#51716) · 4638a62e

由 iSerendipity 提交于 3月 21, 2023

* move DataType from paddle::experimental to phi

* convert namespace

* convert namespace

* convert namespace

* clarify namespace

* convert more datatype

* Revert "convert more datatype"

This reverts commit 083b462959e6a22d4d8767707b628b95b396642e.

* convert more in auto_code_generator

* fix conflicts for XPU

* fix namespace conflicts

* fix errors

* Revert "fix errors"

This reverts commit f9d9958b54ee32141112274c8a5c3c381ab0f876.

* fix errors

* fix formatting

4638a62e

S
[OPT] FlashAttention && ModelParallel (#51617) · 4640f4be
由 ShenLiang 提交于 3月 21, 2023
```
* fix flash_attention

* Update mp_layers.py
```
4640f4be
Z

Fix compile error in cublaslt (#51793) · 325feca6
由 Zhang Zheng 提交于 3月 21, 2023

325feca6
[Zero-Dim] Support output 0D for argmin/argmax/median/kthvalue/mode/equal_all/allclose (#51889) · cdefcd00
由 zhouweiwei2014 提交于 3月 21, 2023
```
* [Zero-Dim] Support output 0D for argmin/argmax/median/kthvalue/mode/equal_all/allclose

* fix CI
```
cdefcd00
S
[AMP OP&Test] Support fp16/bf16 for cumsum (#51694) · 01eeba5e
由 Siming Dai 提交于 3月 21, 2023
```
* add fp16 unittest

* support bf16 and add unittest

* fix according to review
```
01eeba5e

PaddlePaddle / Paddle 大约 1 年 前同步成功

PaddlePaddle / Paddle
大约 1 年前同步成功