提交 · 6bd5fd752662d276e4e53e6d30eae1941377fde7 · PaddlePaddle / Paddle

10 4月, 2023 19 次提交

[AMP OP&Test] Add fp16 and bf16 test to activation (#52521) · 6bd5fd75

由 Vvsmile 提交于 4月 10, 2023

* adjust defalut tolerance of output and grad

* fix a bug in the grad of OpTest

* fix the type of setting defalut value in optest, both forward and
backward

* add defalut

* fix test_sum_op

* adjust tolerance

* fix the tolerance of eager

* add bf16 and fp16 to the activation tests

* remove some fixs

* fix activation

* fix fp16

* fix gelu

* fix the activation tests

* add bfloat16 specialization to singrad and cosgrad

* fix bugs

* fix bugs

* add unittest

* add skip

* add fp/bf to rrelu/rrelu_grad

* git add rrelu

* fix bugs

6bd5fd75

【AMP OP&Test】instance_norm fp16 and bf16 support. (#52241) · 7c98abd9

由 qizhaoaoe 提交于 4月 10, 2023

* add fp16 and bf16 support for instance_norm

* fix /= operator which not support bf16

* fix instance_norm_grad kernel and unittests.

* fix fp32 unittests.

* fix instance_norm_kernel and unittests.

* fix instance_norm_grad_kernel and unittest threshold.

* add fp16/bf16 for instance_norm_grad_grad op.

* add bf16 dtype check.

* fix conflicts.

* fix cpu support for fp32 op and fix type in instance_norm_grad_kernel.

* fix type in instance_norm_kernel.

* fix bf16 outputs in unittests and refine codes.

* fix dx computation.

* delete unuseful params and head including.

* add fp16/bf16 for static graph.

* fix device condiction for instance_norm op.

* fix instance_norm_grad_grad and bf16 op tests.

* fix op_test to support grad of bf16 can be compared with fp32.

* remove updates.

* add self-defined grad.

7c98abd9

W

add autogen code support for logcumsumexp op (#52682) · 891cf433
由 Wang Xin 提交于 4月 10, 2023

891cf433
H
register fluid kerenls to phi [part 7] (#52577) · aa35331f
由 huangjiyi 提交于 4月 10, 2023
```
* update

* fix bug

* fix ci-windows-openblas

* fix test_partial_sum_op

* fix codestyle
```
aa35331f
J

remove infrt V1.1 (#52672) · 6913feb0
由 jjyaoao 提交于 4月 10, 2023

6913feb0

【PaddlePaddle Hackathon 4 No.36】为 Paddle 优化 tile op 在 GPU 上的计算性能 (#52482) · 61fe2198

由 Zero Rains 提交于 4月 10, 2023

* fix divide zero bug for softmax_with_cross_entropy

* change the single test way

* can run but slow. the most important is that I do not know why it slow

* remove some useless commet

* change the copyright to correct

* remove some useless change

* if repeat_times == 1, we will not use BroadcastKernel

61fe2198

C

support auto generate for eigvalsh (#52687) · 93404a61
由 cyberslack_lee 提交于 4月 10, 2023

93404a61
A
【PaddlePaddle Hackathon 4 No.44】为 Paddle 优化 logsumexp op 在 GPU 上的计算性能 (#52509) · 0e776965
由 Asthestarsfalll 提交于 4月 10, 2023
```
* Optimize the performance of logsumexp

* Support zero-dim tensor
```
0e776965
L

support custom device on macos (#52620) · 575cafb4
由 lishicheng1996 提交于 4月 10, 2023

575cafb4
Z

add tensor_utils.h into all.h (#52600) · 3cbcaf1a
由 zyfncg 提交于 4月 10, 2023

3cbcaf1a

add autogen code support for affine_grid op (#52560) · 90280542

由 Wang Xin 提交于 4月 10, 2023

* add autogen code support for affine_grid op

* update op_compat.yaml for affine_grid

* update op_compat.yaml for affine_grid

* fix AffineGridGradInferMeta

* fix CI error

* update AffineGridInferMeta

90280542

modify ~MatmulDescriptor and remove [-Wunused-function] (#52618) · 45f660dd

由 Galaxy1458 提交于 4月 10, 2023

* delete [-Wno-error=terminate], test=develop

* remove GPUps[-Wterminate],test=develop

* remove some -Wno-, test=develop

* modify ~MatmulDescriptor

* mess

45f660dd

R

fix gcc12 error (#52646) · 66a4804b
由 risemeup1 提交于 4月 10, 2023

66a4804b
J

delete paddle/fluid/operators/math,metrics,optimizers,reduce_ops/*_npu.* (#52674) · a6aa701e
由 jjyaoao 提交于 4月 10, 2023

a6aa701e
J

delete paddle/fluid/operators/collective/*_npu.* (#52677) · b451aff8
由 jjyaoao 提交于 4月 10, 2023

b451aff8
J

delete paddle/fluid/operators/controlflow/*_npu.* (#52676) · 4500b64a
由 jjyaoao 提交于 4月 10, 2023

4500b64a
J

delete paddle/fluid/operators/elementwise/*_npu.* (#52675) · 599a201f
由 jjyaoao 提交于 4月 10, 2023

599a201f

张

Remove WITH_ASCEND (#52669) · 0f3bbe10

由张春乔提交于 4月 10, 2023

* mv WITH_ASCEND_CL

* mv WITH_ASCEND

* rollback

* remove WITH_ASCEND

* remove WITH_ASCEND

0f3bbe10

W
[bug fix] fix pow composite (#52645) · f2d1f284
由 wangzhen38 提交于 4月 10, 2023
```
* [bug fix] fix pow composite

* [bug fix] for ci
```
f2d1f284

09 4月, 2023 4 次提交
- R
  [PHI CAPI] support complex dtype kernel (#52414) · b60f48ce
  由 ronnywang 提交于 4月 09, 2023
```
* [PHI CAPI] support complex dtype kernel

* update
```
  b60f48ce
- C
  
  fix fused_dropout_add bug (#52644) · 5df1296d
  由 Chitsing KUI 提交于 4月 09, 2023
  
  5df1296d
- add bf16 for some ops in static mode (#51582) · 6cd095fc
  由 shaojie_wang 提交于 4月 08, 2023
  
  6cd095fc
- S
  add autogen code support for matrix_nms. (#52479) · 8abc5333
  由 scotty 提交于 4月 09, 2023
```
* add autogen code support for matrix_nms.

* update
```
  8abc5333
08 4月, 2023 3 次提交
- K
  [StandaloneExe] add strategy force_sequential_run (#52652) · e1692dc7
  由 kangguangli 提交于 4月 08, 2023
```
* add strategy force_sequential_run

* fix

* fix

* fix

* fix

* fix
```
  e1692dc7
- 张
  昇腾和寒武纪相关代码退场 WITH_ASCEND_CL (#52612) · 2b40434e
  由张春乔提交于 4月 08, 2023
```
* mv WITH_ASCEND_CL

* mv WITH_ASCEND

* rollback
```
  2b40434e
- R
  
  support auto generate static for truncated_gaussian_random (#52540) · ed9bac2f
  由 RedContritio 提交于 4月 08, 2023
  
  ed9bac2f
07 4月, 2023 13 次提交
- L
  
  Expose capi in MacOS to enable GPU computing as custom device (#52589) · 36ec9d2f
  由 lishicheng1996 提交于 4月 07, 2023
  
  36ec9d2f
- R
  Isolate DenseTensor::set_type and DenseTensor::set_layout from header file (#52591) · f5ae67e8
  由 Ruibiao Chen 提交于 4月 07, 2023
```
* Isolate DenseTensor::set_type from header file

* Fix selected_rows
```
  f5ae67e8
- [AMP]register bf16 for communication ops (#52555) · 9a0de116
  由 shaojie_wang 提交于 4月 07, 2023
```
* register bf16 for communication ops

* fix bfloat16 type finding compile error in c_allreduce_max_op
```
  9a0de116
- Z
  
  add autogen code support for warpctc op (#52610) · a62de41a
  由 Zhenghai Zhang 提交于 4月 07, 2023
  
  a62de41a
- add distributed p_send/p_recv/reduce_scatter operator (#51858) · 2b12a117
  由 TaoTao Li 提交于 4月 07, 2023
```
fix merge conflicts
```
  2b12a117
- X
  
  [prim] support set output dtype for autogen (#52475) · d6a38532
  由 Xiaoxu Chen 提交于 4月 07, 2023
  
  d6a38532
- R
  
  support auto generate static for tril_indices and triu_indices (#52537) · f3e8c4be
  由 RedContritio 提交于 4月 07, 2023
  
  f3e8c4be
- R
  fix_build_ci_error (#52576) · 8630375c
  由 risemeup1 提交于 4月 07, 2023
```
* fix_build_ci_error

* fix_build_ci_error

* fix_build_ci_error
```
  8630375c
- Y
  
  unify kernel (#52594) · 09da1c4c
  由 YuanRisheng 提交于 4月 07, 2023
  
  09da1c4c
- W
  [CustomDevice] Add enable custom device C Api (#52568) · 5662adcc
  由 wenzhe.wang 提交于 4月 07, 2023
```
fix bugs
Co-authored-by: Nwenzhe.wang <wenzhe.wang@xdxct.com>
```
  5662adcc
- W
  
  clean up WITH_MLU (#52546) · e75c01f9
  由 Wang Xin 提交于 4月 07, 2023
  
  e75c01f9
- J
  
  [kunlun] bugfix for collective softmax_with_ce (#52565) · 075d6b14
  由 jameszhang 提交于 4月 07, 2023
  
  075d6b14
- add argmax to ops (#52562) · d947b20a
  由 engineer1109 提交于 4月 07, 2023
  
  d947b20a
06 4月, 2023 1 次提交
- Y
  
  fix build bug (#52566) · 6c01ce8a
  由 yuehuayingxueluo 提交于 4月 06, 2023
  
  6c01ce8a

PaddlePaddle / Paddle 大约 1 年 前同步成功

PaddlePaddle / Paddle
大约 1 年前同步成功