提交 · 90c3bddfda207c9b5197a7db99cf98707fdb7a7c · PaddlePaddle / Paddle

10 4月, 2023 15 次提交

G
Autogen code bilinear_tensor_product (#52690) · 90c3bddf
由 gouzil 提交于 4月 10, 2023
```
* add autogen code bilinear_tensor_product

* [phi] rm cc file
```
90c3bddf
C

【Hackathon4 No58】fix exponential and pad (#51300) · 3ee2b237
由 cyberslack_lee 提交于 4月 10, 2023

3ee2b237
L
Autogen softmax_with_cross_entropy (#52515) · 351ccb63
由 lzydev 提交于 4月 10, 2023
```
* autogen softmax_with_cross_entropy

* fix error in softmax_with_cross_entropy version
```
351ccb63

[enforce.h Decouple gflags.h] Move gflags.h from enforce.h to enforce.cc (#52573) · 3c0b1795

由 HongyuJia 提交于 4月 10, 2023

* [enforce.h Decouple gflags.h] Move gflags.h from enforce.h to enforce.cc

* Add gflags.h for other files

* Add gflags.h for other files

* Add gflags.h for blas_impl.hip.h

* Add gflags.h for miopen_helper.h

3c0b1795

[AMP OP&Test] Add fp16 and bf16 test to activation (#52521) · 6bd5fd75

由 Vvsmile 提交于 4月 10, 2023

* adjust defalut tolerance of output and grad

* fix a bug in the grad of OpTest

* fix the type of setting defalut value in optest, both forward and
backward

* add defalut

* fix test_sum_op

* adjust tolerance

* fix the tolerance of eager

* add bf16 and fp16 to the activation tests

* remove some fixs

* fix activation

* fix fp16

* fix gelu

* fix the activation tests

* add bfloat16 specialization to singrad and cosgrad

* fix bugs

* fix bugs

* add unittest

* add skip

* add fp/bf to rrelu/rrelu_grad

* git add rrelu

* fix bugs

6bd5fd75

【AMP OP&Test】instance_norm fp16 and bf16 support. (#52241) · 7c98abd9

由 qizhaoaoe 提交于 4月 10, 2023

* add fp16 and bf16 support for instance_norm

* fix /= operator which not support bf16

* fix instance_norm_grad kernel and unittests.

* fix fp32 unittests.

* fix instance_norm_kernel and unittests.

* fix instance_norm_grad_kernel and unittest threshold.

* add fp16/bf16 for instance_norm_grad_grad op.

* add bf16 dtype check.

* fix conflicts.

* fix cpu support for fp32 op and fix type in instance_norm_grad_kernel.

* fix type in instance_norm_kernel.

* fix bf16 outputs in unittests and refine codes.

* fix dx computation.

* delete unuseful params and head including.

* add fp16/bf16 for static graph.

* fix device condiction for instance_norm op.

* fix instance_norm_grad_grad and bf16 op tests.

* fix op_test to support grad of bf16 can be compared with fp32.

* remove updates.

* add self-defined grad.

7c98abd9

W

add autogen code support for logcumsumexp op (#52682) · 891cf433
由 Wang Xin 提交于 4月 10, 2023

891cf433
J

remove infrt V1.1 (#52672) · 6913feb0
由 jjyaoao 提交于 4月 10, 2023

6913feb0

【PaddlePaddle Hackathon 4 No.36】为 Paddle 优化 tile op 在 GPU 上的计算性能 (#52482) · 61fe2198

由 Zero Rains 提交于 4月 10, 2023

* fix divide zero bug for softmax_with_cross_entropy

* change the single test way

* can run but slow. the most important is that I do not know why it slow

* remove some useless commet

* change the copyright to correct

* remove some useless change

* if repeat_times == 1, we will not use BroadcastKernel

61fe2198

C

support auto generate for eigvalsh (#52687) · 93404a61
由 cyberslack_lee 提交于 4月 10, 2023

93404a61
A
【PaddlePaddle Hackathon 4 No.44】为 Paddle 优化 logsumexp op 在 GPU 上的计算性能 (#52509) · 0e776965
由 Asthestarsfalll 提交于 4月 10, 2023
```
* Optimize the performance of logsumexp

* Support zero-dim tensor
```
0e776965
L

support custom device on macos (#52620) · 575cafb4
由 lishicheng1996 提交于 4月 10, 2023

575cafb4
Z

add tensor_utils.h into all.h (#52600) · 3cbcaf1a
由 zyfncg 提交于 4月 10, 2023

3cbcaf1a

add autogen code support for affine_grid op (#52560) · 90280542

由 Wang Xin 提交于 4月 10, 2023

* add autogen code support for affine_grid op

* update op_compat.yaml for affine_grid

* update op_compat.yaml for affine_grid

* fix AffineGridGradInferMeta

* fix CI error

* update AffineGridInferMeta

90280542

modify ~MatmulDescriptor and remove [-Wunused-function] (#52618) · 45f660dd

由 Galaxy1458 提交于 4月 10, 2023

* delete [-Wno-error=terminate], test=develop

* remove GPUps[-Wterminate],test=develop

* remove some -Wno-, test=develop

* modify ~MatmulDescriptor

* mess

45f660dd

09 4月, 2023 4 次提交
- R
  [PHI CAPI] support complex dtype kernel (#52414) · b60f48ce
  由 ronnywang 提交于 4月 09, 2023
```
* [PHI CAPI] support complex dtype kernel

* update
```
  b60f48ce
- C
  
  fix fused_dropout_add bug (#52644) · 5df1296d
  由 Chitsing KUI 提交于 4月 09, 2023
  
  5df1296d
- add bf16 for some ops in static mode (#51582) · 6cd095fc
  由 shaojie_wang 提交于 4月 08, 2023
  
  6cd095fc
- S
  add autogen code support for matrix_nms. (#52479) · 8abc5333
  由 scotty 提交于 4月 09, 2023
```
* add autogen code support for matrix_nms.

* update
```
  8abc5333
08 4月, 2023 1 次提交
- R
  
  support auto generate static for truncated_gaussian_random (#52540) · ed9bac2f
  由 RedContritio 提交于 4月 08, 2023
  
  ed9bac2f
07 4月, 2023 7 次提交
- L
  
  Expose capi in MacOS to enable GPU computing as custom device (#52589) · 36ec9d2f
  由 lishicheng1996 提交于 4月 07, 2023
  
  36ec9d2f
- R
  Isolate DenseTensor::set_type and DenseTensor::set_layout from header file (#52591) · f5ae67e8
  由 Ruibiao Chen 提交于 4月 07, 2023
```
* Isolate DenseTensor::set_type from header file

* Fix selected_rows
```
  f5ae67e8
- Z
  
  add autogen code support for warpctc op (#52610) · a62de41a
  由 Zhenghai Zhang 提交于 4月 07, 2023
  
  a62de41a
- add distributed p_send/p_recv/reduce_scatter operator (#51858) · 2b12a117
  由 TaoTao Li 提交于 4月 07, 2023
```
fix merge conflicts
```
  2b12a117
- R
  
  support auto generate static for tril_indices and triu_indices (#52537) · f3e8c4be
  由 RedContritio 提交于 4月 07, 2023
  
  f3e8c4be
- W
  
  clean up WITH_MLU (#52546) · e75c01f9
  由 Wang Xin 提交于 4月 07, 2023
  
  e75c01f9
- add argmax to ops (#52562) · d947b20a
  由 engineer1109 提交于 4月 07, 2023
  
  d947b20a
06 4月, 2023 13 次提交

Y

fix build bug (#52566) · 6c01ce8a
由 yuehuayingxueluo 提交于 4月 06, 2023

6c01ce8a

Remove oneDNN-specific attributes from matmul (#49444) · 4d97b25d

由 Sławomir Siwek 提交于 4月 06, 2023

* replace matmul with matmul_v2 in fuse passes

* Remove fusion logic from matmul

* removing fusion methods

* add proper name

* adjust namespaces

* clean attrs in python tests

* delete checkpoint and restore matmul version

* remove unused code

* matmul and reshape/transpose fuses migrated

* split MatmulOneDNN headers

* fuse activation and eltwise_add

* add fuse_activation

* matmul_transpose_reshape/reshape_transpose_matmul

* matmul + elementwise_add (fused)

* activation temporary modifciation

* restore matmul(v1) version 0

* merge newest develop

* remove depedency from other PR

* revert pbtxt

* remove placeholders from matmul_v2

* add description in OPMaker

* remove matmul_v2_op.h and all depedencies

* remove dims changing in base op

* add possibility to fuse already fused_matmul

* restart broken CI

* Empty-Commit

* revert matmul_utils.h

* codestyle

* adjust imports

* add pbtxt file

* 100% matmul unit tests coverage

* trigger CI with minimal changes to develop

* adjust changes to develop

* add fused_matmul op

* inherit base ops

* add "v2"

* move OPMaker

* Gradually add fused_matmul files

* second batch of fused_matmul changes

* split infershapes of matmul_v2 and fused_matmul

* merge code from other PR

* 2023

* inherit fused_matmul from matmul_v2

* Update paddle/phi/backends/onednn/onednn_reuse.h
Co-authored-by: NTomasz Socha <tomasz.socha@intel.com>

* Update paddle/phi/kernels/fusion/onednn/fused_matmul_kernel.cc
Co-authored-by: NTomasz Socha <tomasz.socha@intel.com>

* resolve conflicts

* codestyle

* simplify isgemmlinear

* 2023

* remove import

* reuse methods

* matmul_v2_mkldnn cleanup

* simplify ExecuteMatMulV1Grad

* matmul refactored

* fc

* SetOutMemDescWithLogicalLayoutFusesSupport

* matmul_v2

* alpha support

* group repetetive funcs

* matmul utils

* execute matmul methods

* restore registered kernel names

* split header and impl files

* remove double negatives

* reduce numer of modified files

* adjust ExecuteMatmul

* add scales for ut

* dates

* limit number of modified files

* fluid imports

* remove alpha

* codestyle

---------
Co-authored-by: NTomasz Socha <tomasz.socha@intel.com>

4d97b25d

Move fused_attention op to phi [迁移前向 GPU OpKernel] (#51743) · a7ec8958

由 Sonder 提交于 4月 06, 2023

* add kernel functions

* update kernel functions

* update func parameters' name

* create codes for gpu device

* 调整文件位置

* fix include error

* remove dependent files to phi/

* restore fused_attention_op.cu

* fix dependence errors

* fix dependence errors

* fix include error

* fix all depandence errors[build success]

* remove useless include

* recover useless include

* use phi::ToNCCLDataType

* fix namespace

* update new register code

* fix error in fused_gemm_epilogue_utils

* fix error in FusedAttentionKernel parm

* finish fused_attention registe code[build success]

* add paddle::optional

* add sig file

* fix build error

* fix a include error

* update CMkaeList

* fix parameter sequence

* add include file

* update #if before include

* fix grammly error

* update codes for DropoutParam

* remove const cast

* trans some fluid api to phi api

* add #if

* update test code

* update test codes

* recover test codes

* trans fused_attention to fluid

* move #endif to end

* move #endif

* delete useless files

* use fused attention utils and recover random seed

* remove fluid include in phi

a7ec8958

S

add autogen code support for logical_and, logical_not, logical_or and logical_xor (#52451) · 6df4a667
由 scotty 提交于 4月 06, 2023

6df4a667
R

support auto generate static for assign_value (#52534) · d394c9ed
由 RedContritio 提交于 4月 06, 2023

d394c9ed
R

support auto generate static for decode_jpeg (#52542) · c1f97a9b
由 RedContritio 提交于 4月 06, 2023

c1f97a9b
张

mv PADDLE_WITH_ASCEND_CL (#52535) · 80dd1672
由张春乔提交于 4月 06, 2023

80dd1672
J

support more custom vjp (#52533) · 29c28e2f
由 Jiabin Yang 提交于 4月 06, 2023

29c28e2f

feat: add composite rule of roll grad (#52532) · 348a36b5

由 Kang Zhao 提交于 4月 06, 2023

* feat: add relu composite rule

* feat: add relu composite rule, maximum op

* feat: add relu composite rule, maximum op

* feat: add relu composite rule, polish comments

* feat: add relu composite rule, polish comments

* feat: add relu composite rule, add python api of relu

* feat: add relu composite rule, commit hook

* fix: maximum type error & ban cinn test

* fix: maximum input sequence bugs

* resolve conflicts

* fix: code style bugs

* add: relu fp16 test

* feat: add rsqrt composite rule

* feat: add rsqrt composite rule

* resolve conflicts of composite rule

* fix: delete check eager

* feat: add roll grad composite rule

* fix minus shift

* fix test roll op

348a36b5

Z
Rename conv2d transpose grad grad (#52371) · 49bbd466
由 zhangyuqin1998 提交于 4月 06, 2023
```
* Rename conv2d transpose grad grad

* fix
```
49bbd466
C

fix backend bug (#52526) · 380a9bf7
由 Chitsing KUI 提交于 4月 06, 2023

380a9bf7
S
Fix flash attention bug (#52551) · 8ac5a6b6
由 sneaxiy 提交于 4月 06, 2023
```
* fix flash attn

* fix another API
```
8ac5a6b6

[PHI] Adjust files of fusion kernel in PHI (#52420) · 84bb7a96

由 zyfncg 提交于 4月 06, 2023

* update readme

* remove unused header file

* fix bug

* fix onednn

* fix onednn

* rename header file

84bb7a96

PaddlePaddle / Paddle 大约 1 年 前同步成功

PaddlePaddle / Paddle
大约 1 年前同步成功