提交 · 2d02b0c1c50899ee8c90f9dc29127f5b6dcf0a34 · PaddlePaddle / Paddle

08 5月, 2023 1 次提交

[Cherry-pick]Cherry pick 0d output (#53538) · 2d02b0c1

由 GGBond8488 提交于 5月 08, 2023

* add 0D output support for inalg.slogdet,test=allcase

* fix zerom dime test error test=allcase

* fix test error test=allcase

* add static backward test, test=allcase

* support_0D_output_for_matrix_rank_multi_dot, test=allcase

* add 0D output test for matrox_rank and mutli_dot test=allcase

* fix assert error ,test=allcase

* fix test error, test=allcase

* fix other test error, test=allcase

* fix other test error, test=allcase

* fix test error, test=allcase

* fix matrix_rank and multi dot test err test=allcase

* fix test error test=allcase

* fix test zero dim test, test=allcase

* add static backward test for multi_dot, test=allcase

* add tol 2d broadcast test case, test=allcase

* fix test error test=allcase

* fix test error test=allcase

* test=allcase

* support_0d_output_for_linalg.norm

* fix test error test=allcase

* fix 0D test

* fix test error test=allcase

* fix test error test=allcase

* fix tets,test=allcase

* fix error,test=allcase

* fix errors ,test=allcase

* add static backward , test=allcase

* add static backwward test, test=allcase

* slogdet_support_0D_output

* add new case

* fix tests, test=allcase

* cherry-pick

* cherry-pick

* fix trace gpu kernel 0d error, test=allcase

* fix windows error, test=allcase

* add matrixrank cherry-pick

2d02b0c1

06 5月, 2023 2 次提交
- Z
  [cherry-pick]add flash randomness control and add scaled_dot_product_attention (#53518) · 1d23e0bb
  由 zhangkaihuo 提交于 5月 06, 2023
```
att, cherry-pick: #52902 #53113
```
  1d23e0bb
- Z
  [Cherry-Pick] AMP OP&Test support from Hackathon (#53522) · 39b704c1
  由 Zhang Zheng 提交于 5月 06, 2023
```
低精度算子支持和单测补充，合并 cherry pick 17个Hackathon PR，共覆盖25个OP的低精度支持及完善
```
  39b704c1
27 4月, 2023 1 次提交

[Cherry-Pick]Support output 0D for... · f84ac449

由 wangfengsheng1999 提交于 4月 27, 2023

[Cherry-Pick]Support output 0D for is_empty/as_complex/inner/dot/rank/tensordot/squeeze_/static.accuracy/static.auc/metric.accuracy (#53199)

* support output 0D for is_empty/as_complex/inner/dot/rank/tensordot/squeeze_/static.accuracy/static.auc/metric.accuracy

* test_dot_py

* test_dot_py

f84ac449

19 4月, 2023 1 次提交
- Z
  [Cherry-Pick] Unique support float16&bfloat16 (#53023) · 00b7c819
  由 Zhang Zheng 提交于 4月 19, 2023
```
unique支持float16和bfloat16数据类型，并完善相关单测。
```
  00b7c819
17 4月, 2023 5 次提交

V
[AMP OP&Test]Add BF16 implementation and unit tests of multinomial (#52898) · d19d2486
由 Vvsmile 提交于 4月 17, 2023
```
* fix multinomial

* fix test_elementwise

* fix convert_float_to_uint16

* aadd test_multimial_op

* fix code style
```
d19d2486

【PaddlePaddle Hackathon 4 No.49】：为 Paddle bce_loss 支持 float16 数据类型 (#50930) · 44e6de98

由 thunder95 提交于 4月 17, 2023

* untracked files

* bce_loss_fp16

* remove unused files

* back max_rel_erro still big

* simplify code

* upd

* fix max_relative_error

* restart ci

* Update test_bce_loss.py

* Update test_bce_loss.py

* Update test_bce_loss.py

* Update test_bce_loss.py

* try to pass test

* restore file

* remove error value

* fix bug

---------
Co-authored-by: NZhang Ting <Douyaer2020@qq.com>

44e6de98

【Hackathon No.32】为 Paddle 优化 expand_as 前向&反向 op 在 GPU 上的计算性能 (#52700) · 3c44e948

由 Hanchiao 提交于 4月 17, 2023

* Implement optimized kernel for OP-expand_as.

* Support fp16.
Co-authored-by: Timber-Ye <ye_hanqiao@163.com>
Co-authored-by: NBrianQian1999 <brianqianhitsz@gmail.com>

* remove fp16 support

* remove MAX_RANK_SUPPORTED

---------
Co-authored-by: NBrianQian1999 <brianqianhitsz@gmail.com>

3c44e948

Z

rename_SliceKernel (#52863) · d2b0d63f
由 zhangyuqin1998 提交于 4月 17, 2023

d2b0d63f

Add output defs for some kernelsPhi register (#52941) · 23f87442

由 Sonder 提交于 4月 17, 2023

* add register info for eigh and eig_gard

* add sync_batch_norm_op.cu register info

* add lamb output register info

* add unique register info

* change type name

* change type name

* add output register info for check_finite_and_unscale

* update cmake and config file

* add register info for adagrad

* fix build error

* add sync to run_unittests.sh

* add register info for unique_consecutive

* fix build error

* add eigh to STATIC_BUILD_TESTS

* update eig_kernel.cc

* update eig_kernel.cc

* fix infer mate error

* fix unique register error

* fix lamb register info error

* fix lamb register info

* update lamb register info

* fix lamb

* remove one Output Register

* update static build file

* add eigh op to disable_wingpu_test

* update run_unittests

23f87442

14 4月, 2023 8 次提交
- Z
  
  [AMP OP&Test] Cumprod support fp16 and bf16 (#52919) · 8a850af6
  由 Zhang Zheng 提交于 4月 14, 2023
  
  8a850af6
- C
  
  【Hackathon4 No58】logcumsum logsum (#51275) · 468869e4
  由 cyberslack_lee 提交于 4月 14, 2023
  
  468869e4
- C
  
  【Hackathon4 No58】kthvalue (#51615) · 43efb979
  由 cyberslack_lee 提交于 4月 14, 2023
  
  43efb979
- C
  【Hackathon No.62】digamma, dirichlet算子FP16/BF16单测完善 (#52604) · 7ecbcc08
  由 chenxujun 提交于 4月 14, 2023
```
* Add digamma, dirichlet tests

* Fix code
```
  7ecbcc08
- S
  【Hackathon No.55】add erf FP16 test and BF16 test (#52136) · eeb4d165
  由 superwinner1 提交于 4月 14, 2023
```
* add erf FP16 test
```
  eeb4d165
- C
  
  Add angle,bmm tests (#52630) · 6d7ee668
  由 chenxujun 提交于 4月 14, 2023
  
  6d7ee668
- G
  [phi] move sequence_pool to phi - Step 2 : sequence_pool_op (#52750) · b281b221
  由 gouzil 提交于 4月 14, 2023
```
* [phi] move sequence_pool kernel to phi

* [phi] mv sequence_pooling to phi funcs

* [phi] mv sequence_pooling_test

* [phi] RollBACK `paddle/fluid/operators/sequence_ops/sequence_pool_op.cc`

* [phi][funcs] fix mutable_data

* [phi][funcs] fix mutable_data
```
  b281b221
- Z
  
  delete unused param from swish_grad and relu6_grad (#52805) · 54e4360a
  由 zhangyuqin1998 提交于 4月 14, 2023
  
  54e4360a
13 4月, 2023 7 次提交

S
【Hackathon No.55】 add channel_shuffle FP16/BF16 support and tests (#51884) · 48ccb785
由 superwinner1 提交于 4月 13, 2023
```
* No55 add channel_shuffle FP16/BF16 support and tests
```
48ccb785

【Hackathon No57】add_fp16_bf16_for_dot & bf16_for_cross (#52426) · 205094f0

由 Difer 提交于 4月 13, 2023

* add_fp_bf_for_dot & bf_for_cross

* fix error

* fix some error

* fix some error

* change something

* fix magic number

205094f0

Z
[AMP OP&Test] Support fp16&bf16 in reduce_max (#52862) · e0e044c0
由 Zhang Zheng 提交于 4月 13, 2023
```
* [AMP OP&Test] Support fp16&bf16 in reduce_max
```
e0e044c0
C

Add pixel_shuffle pixel_unshuffle fp16/bf16 (#52582) · 2aaed989
由 chenxujun 提交于 4月 13, 2023

2aaed989
C

Add overlap_add, sign tests (#52667) · cb6de765
由 chenxujun 提交于 4月 13, 2023

cb6de765

[enforce.h Decouple logging.h] Delete glog/logging.h from enforce.h (#52651) · 5664ea26

由 HongyuJia 提交于 4月 13, 2023

* [enforce.h Decouple logging.h] Delete glog/logging.h from enforce.h

* Add logging.h for profiler.cc

* Add logging.h for gloo_utils.h

* Add logging.h for addmm_kernel_impl.h

* Add logging.h for addmm_grad_kernel_impl.h

* Add logging.h for p_send_kernel.cu

* Add logging.h for determinant_grad_kernel_impl.h

* Add logging.h for p_recv_kernel.cu

* Add logging.h for elementwise_grad_base.h

* Add logging.h for transfer_layout_kernel.cc

* Add logging.h for eigvals_kernel.cc and index_select_impl.h

* Add logging.h for all files in kernel directory

* Add logging.h for xpu_info.cc

* Add logging.h for xpu

5664ea26

Z

rename_bilinear_tensor_op (#52745) · eb93b5c9
由 zhangyuqin1998 提交于 4月 13, 2023

eb93b5c9

12 4月, 2023 3 次提交

Z
Optimize performance of unique kernel (#52736) · 8cbeefea
由 Zhang Zheng 提交于 4月 12, 2023
```
* Optimize performance of unique kernel

* fix ci
```
8cbeefea

[AMP OP&Test] add fp16/bf16 unittest for pool2d op (#52288) · f9b155f9

由 Wei Shengyu 提交于 4月 12, 2023

* add bf16 support and bf16/fp16 unittest for pool2d

* add include files

* dbg

* reformat

* reformat

* modify code according to review comment

* remove duplicate code

* remove dup code

* remove useless include

* dbg

f9b155f9

[AMP OP&Test] support bf16 for batch norm (#52407) · 523f8a26

由 Guoxia Wang 提交于 4月 12, 2023

* [AMP OP&Test] support bf16 for batchnorm

* codestyle

* Update batch_norm_grad_kernel.cu

* Update batch_norm_kernel.cu

* fix codestyle

* fix

* fix

* fix

* fix

* fix

* Update batch_norm_kernel.cc

523f8a26

11 4月, 2023 3 次提交
- W
  [AMP OP&Test]Add fp16/bf16 support isnan/isfinite/isinf op (#52259) · aaf873b2
  由 WJJ1995 提交于 4月 11, 2023
```
* add bfp16 test for isfinite

* fixed for ci

* deal with comments

* fixed test

* skip test in cpu

* deal with comments

* fixed for ci

* fixed testcase

* fixed for ci

* fixed for testcase
```
  aaf873b2
- L
  Add output defs for eigh kernel (#51362) · da0c7e14
  由 LinearTemporalLogic 提交于 4月 11, 2023
```
* Add output defs for eigh kernel

* fix

* update

* update

* fix

* fix
```
  da0c7e14
- T
  
  [AMP OP&Test] add bf16 fp16 type support for expand_v2_op and top_k_v2_op (#51263) · 5b09dd56
  由 Thomas Young 提交于 4月 11, 2023
  
  5b09dd56
10 4月, 2023 8 次提交

D
【Hackathon No57】 add fp16 & bf16 for flip, fp16 for gaussian (#52380) · 2b0fffc2
由 Difer 提交于 4月 10, 2023
```
* add_fp_bf_for_flip_gaussian_random

* forget convert uint

* fix some error

* fix some error
```
2b0fffc2
C

【Hackathon4 No58】fix exponential and pad (#51300) · 3ee2b237
由 cyberslack_lee 提交于 4月 10, 2023

3ee2b237

[enforce.h Decouple gflags.h] Move gflags.h from enforce.h to enforce.cc (#52573) · 3c0b1795

由 HongyuJia 提交于 4月 10, 2023

* [enforce.h Decouple gflags.h] Move gflags.h from enforce.h to enforce.cc

* Add gflags.h for other files

* Add gflags.h for other files

* Add gflags.h for blas_impl.hip.h

* Add gflags.h for miopen_helper.h

3c0b1795

[AMP OP&Test] Add fp16 and bf16 test to activation (#52521) · 6bd5fd75

由 Vvsmile 提交于 4月 10, 2023

* adjust defalut tolerance of output and grad

* fix a bug in the grad of OpTest

* fix the type of setting defalut value in optest, both forward and
backward

* add defalut

* fix test_sum_op

* adjust tolerance

* fix the tolerance of eager

* add bf16 and fp16 to the activation tests

* remove some fixs

* fix activation

* fix fp16

* fix gelu

* fix the activation tests

* add bfloat16 specialization to singrad and cosgrad

* fix bugs

* fix bugs

* add unittest

* add skip

* add fp/bf to rrelu/rrelu_grad

* git add rrelu

* fix bugs

6bd5fd75

【AMP OP&Test】instance_norm fp16 and bf16 support. (#52241) · 7c98abd9

由 qizhaoaoe 提交于 4月 10, 2023

* add fp16 and bf16 support for instance_norm

* fix /= operator which not support bf16

* fix instance_norm_grad kernel and unittests.

* fix fp32 unittests.

* fix instance_norm_kernel and unittests.

* fix instance_norm_grad_kernel and unittest threshold.

* add fp16/bf16 for instance_norm_grad_grad op.

* add bf16 dtype check.

* fix conflicts.

* fix cpu support for fp32 op and fix type in instance_norm_grad_kernel.

* fix type in instance_norm_kernel.

* fix bf16 outputs in unittests and refine codes.

* fix dx computation.

* delete unuseful params and head including.

* add fp16/bf16 for static graph.

* fix device condiction for instance_norm op.

* fix instance_norm_grad_grad and bf16 op tests.

* fix op_test to support grad of bf16 can be compared with fp32.

* remove updates.

* add self-defined grad.

7c98abd9

【PaddlePaddle Hackathon 4 No.36】为 Paddle 优化 tile op 在 GPU 上的计算性能 (#52482) · 61fe2198

由 Zero Rains 提交于 4月 10, 2023

* fix divide zero bug for softmax_with_cross_entropy

* change the single test way

* can run but slow. the most important is that I do not know why it slow

* remove some useless commet

* change the copyright to correct

* remove some useless change

* if repeat_times == 1, we will not use BroadcastKernel

61fe2198

C

support auto generate for eigvalsh (#52687) · 93404a61
由 cyberslack_lee 提交于 4月 10, 2023

93404a61
A
【PaddlePaddle Hackathon 4 No.44】为 Paddle 优化 logsumexp op 在 GPU 上的计算性能 (#52509) · 0e776965
由 Asthestarsfalll 提交于 4月 10, 2023
```
* Optimize the performance of logsumexp

* Support zero-dim tensor
```
0e776965

09 4月, 2023 1 次提交
- add bf16 for some ops in static mode (#51582) · 6cd095fc
  由 shaojie_wang 提交于 4月 08, 2023
  
  6cd095fc

PaddlePaddle / Paddle 1 年多 前同步成功

PaddlePaddle / Paddle
1 年多前同步成功