提交 · 1ab562ca05cf8ef5694c5b68caec1b443fbc6e56 · PaddlePaddle / Paddle

10 5月, 2023 5 次提交

Y
[cherry-pick] Fix the index calculation in cross_entroy_kernel. (#53659) (#53666) · 1ab562ca
由 Yiqun Liu 提交于 5月 10, 2023
```
cherry-pick #53659
```
1ab562ca
Z
[Cherry-Pick] Fix bug in log_softmax kernel when lastdim is larger than 100000 (#53657) · a7cad386
由 Zhang Zheng 提交于 5月 10, 2023
```
Fix bug in log_softmax kernel when lastdim is larger than 100000

There is an unexpected log in the calculation

Cherry-Pick: #53654
```
a7cad386
Q
revert argsort to fix OOM bug (#53647) · 6707142a
由 Qi Shao 提交于 5月 10, 2023
```
Revert argsort to the version without full sort algorithm implemented
```
6707142a

[cherry-pick 2.5] Broadcast && Dropout_nd Performance Optimization into Release/2.5 (#53623) · f9ea2301

由 Bo Zhang 提交于 5月 10, 2023

* Support different dtypes of inputs for broadcast for dropout optimization  (#52093)

* change judgement for DropoutGradGPUKernelDriver

* add UnrollerWithoutVecSize and after this Loaddata to be refined

* pass unittest

* use same unroller with XPU

* BroadcastWithInt64Index

* BroadcastDataLoader template partial specialization

* fix compile errs in ROCms

* PR comment

* dropout_nd_optimization (#51479)

* with printf

* add DropOutNdForwardKernel

* PR comment

* Dropout optimize & clean broadcast inT and ElementwiseType (#52969)

* change judgement for DropoutGradGPUKernelDriver

* add UnrollerWithoutVecSize and after this Loaddata to be refined

* pass unittest

* use same unroller with XPU

* BroadcastWithInt64Index

* BroadcastDataLoader template partial specialization

* fix compile errs in ROCms

* clean ElementwiseT and InT for BroadcastKernel

* default axis and clean inT

* remove redundant fast divmod computation

* optimize drop_nd & drop_nd_grad

* optimize BroadcastDataLoader bf16 fp16

* rm InT etc. after merge develop

* delete constexpr for windows ci

* fix conflict

* fix conflic with develop

* fix conflic

* new clean

* clean

* Fix xpu2 kp compile error (#53548)

* fix conflict

* conflict

f9ea2301

[Zero-Dim] add 0D Tensor UT case for XPU (#53611) · 3a247cba
由 zhouweiwei2014 提交于 5月 10, 2023

3a247cba

09 5月, 2023 4 次提交

L
Cherry pick fused linear (#53621) · f21b6f08
由 limingshu 提交于 5月 09, 2023
```
Cherry pick fused linear
```
f21b6f08

【cherry-pick】Op test add complex support (#53604) · c8504d86

由 GGBond8488 提交于 5月 09, 2023

* add complex support for  optest

* add complex grad test

* append one

* move some debug info

* move some debug info

* move some debug info

* move some debug info

* add more complex test

* Fix naming ambiguity

* Revert "add more complex test"

This reverts commit dbcb0516b8e53ba42e2d6089878a39b395345969.

* change backward gradient, add TODO

c8504d86

[cherry-pick 2.5][Zero-Dim] support paddle.sum/mean/loss api output 0D (#53601) · b6e23774

由 zhouweiwei2014 提交于 5月 09, 2023

* [Zero-Dim] fix functool.reduce more safe with intial value, to support empty list (#53182)

* [Zero-Dim] support 0d tensor for shape and squeeze onednn kernel (#52832)

* support 0d tensor for shape and squeeze onednn kernel

* set python api for shape op ut

* [Zero-Dim] distributed scatter/all_to_all support input 0D tensor (#53186)

* [Zero-Dim] Support paddle.sum/mean/loss api output 0D,test=allcase (#52739)

* [CINN Support 0D-Tensor] CINN supports 0D-Tensor with trick temporarily (#53382)

* [CINN Support 0D-Tensor] CINN supports 0D-Tensor with trick temporarily

* Add unittest

* [CINN Support 0D-Tensor] CINN hack squeeze2 with trick temporarily (#53454)

* fix test_autograd_dynamic (#53473)
Co-authored-by: Nzhwesky2010 <zhouwei25@baidu.com>

---------
Co-authored-by: NYangQun <qun.yang@intel.com>
Co-authored-by: NHongyuJia <jiahongyu@baidu.com>
Co-authored-by: NHydrogenSulfate <490868991@qq.com>

b6e23774

[Cherry-pick] zero-dim: support 0-D for getitem/setitem (#53441) · 767e7b3f

由 JYChen 提交于 5月 09, 2023

* support 0-D output and 0-D as indice in __getitem__

* fix tests

* fix inference and UT

* add unittest for setitem

* fix xpu test

* fix xpu 0-d

* fix right value is 0d and index is List/Tensor

* Hack__getitem__ from 0-d to 1-d with FLAGS_set_to_1d

* change PHI_DECLARE_xxx to DECLARE_xxx since the change not merged to 2.5

* hack 1-D tensor to Scalar

* throw warning at __getitem__, not slice_utils

767e7b3f

08 5月, 2023 3 次提交

[Cherry-Pick] Fix the calculation of y_grad in divide_backward (#53584) · e63fb1e6

由 Zhang Zheng 提交于 5月 08, 2023

Cherry-Pick: #53582
修改内容：在除法out = x / y中，将y的反向公式由dy = -dout * out / y 改为 dy = -dout * ((x / y) / y)
修改原因：使用result作为反向的输入，在低精度的时候本身cast之后就会存在一些精度损失，所以重新计算后才是更准确的结果
修改影响：此改动可以使结果更精确且对性能影响忽略不计

e63fb1e6

Cherry-pick #53432 and #53556 (#53576) · 6583c390

由 Yiqun Liu 提交于 5月 08, 2023

* Add fused_gate_attention API. (#53432)
* Add PADDLE_THROW in take_along_axis kernel when the datatype of index is wrong. (#53556)

6583c390

[Cherry-pick]Cherry pick 0d output (#53538) · 2d02b0c1

由 GGBond8488 提交于 5月 08, 2023

* add 0D output support for inalg.slogdet,test=allcase

* fix zerom dime test error test=allcase

* fix test error test=allcase

* add static backward test, test=allcase

* support_0D_output_for_matrix_rank_multi_dot, test=allcase

* add 0D output test for matrox_rank and mutli_dot test=allcase

* fix assert error ,test=allcase

* fix test error, test=allcase

* fix other test error, test=allcase

* fix other test error, test=allcase

* fix test error, test=allcase

* fix matrix_rank and multi dot test err test=allcase

* fix test error test=allcase

* fix test zero dim test, test=allcase

* add static backward test for multi_dot, test=allcase

* add tol 2d broadcast test case, test=allcase

* fix test error test=allcase

* fix test error test=allcase

* test=allcase

* support_0d_output_for_linalg.norm

* fix test error test=allcase

* fix 0D test

* fix test error test=allcase

* fix test error test=allcase

* fix tets,test=allcase

* fix error,test=allcase

* fix errors ,test=allcase

* add static backward , test=allcase

* add static backwward test, test=allcase

* slogdet_support_0D_output

* add new case

* fix tests, test=allcase

* cherry-pick

* cherry-pick

* fix trace gpu kernel 0d error, test=allcase

* fix windows error, test=allcase

* add matrixrank cherry-pick

2d02b0c1

06 5月, 2023 2 次提交
- Z
  [cherry-pick]add flash randomness control and add scaled_dot_product_attention (#53518) · 1d23e0bb
  由 zhangkaihuo 提交于 5月 06, 2023
```
att, cherry-pick: #52902 #53113
```
  1d23e0bb
- Z
  [Cherry-Pick] AMP OP&Test support from Hackathon (#53522) · 39b704c1
  由 Zhang Zheng 提交于 5月 06, 2023
```
低精度算子支持和单测补充，合并 cherry pick 17个Hackathon PR，共覆盖25个OP的低精度支持及完善
```
  39b704c1
27 4月, 2023 2 次提交

[cherry-pick2.5] [Zero-Dim] Support... · b6996598

由 zhouweiwei2014 提交于 4月 27, 2023

[cherry-pick2.5] [Zero-Dim] Support all/any/min/max/prod/logsumexp/amax/amin/some loss output 0D (#53192)

b6996598

[Cherry-Pick]Support output 0D for... · f84ac449

由 wangfengsheng1999 提交于 4月 27, 2023

[Cherry-Pick]Support output 0D for is_empty/as_complex/inner/dot/rank/tensordot/squeeze_/static.accuracy/static.auc/metric.accuracy (#53199)

* support output 0D for is_empty/as_complex/inner/dot/rank/tensordot/squeeze_/static.accuracy/static.auc/metric.accuracy

* test_dot_py

* test_dot_py

f84ac449

25 4月, 2023 2 次提交
- C
  Reduce inference library size and compile time (#53193) · ac01ddda
  由 chalsliu 提交于 4月 25, 2023
```
* Reduce inference library size and compilation time
* fix docstring
```
  ac01ddda
- L
  
  fix dist_grad kernel (#53239) (#53279) · 5638554d
  由 Leo Chen 提交于 4月 25, 2023
  
  5638554d
24 4月, 2023 1 次提交
- J
  Revert "Cherry pick getitem/setitem 0d (#53125)" (#53265) · 50f61213
  由 JYChen 提交于 4月 24, 2023
```
This reverts commit a79c04f3.
```
  50f61213
23 4月, 2023 1 次提交

Cherry pick getitem/setitem 0d (#53125) · a79c04f3

由 JYChen 提交于 4月 23, 2023

* support 0-D output and 0-D as indice in __getitem__

* fix tests

* fix inference and UT

* add unittest for setitem

* fix xpu test

* fix xpu 0-d

a79c04f3

21 4月, 2023 1 次提交

Cherry pick fix set value cpu (#53127) · e4178284

由 JYChen 提交于 4月 21, 2023

* fix the set_value error in cpu

* add a unitest for set_value OP

* fix platform::is_gpu_place

* add todo note for set_value

* fix test

e4178284

20 4月, 2023 1 次提交

[Cherey-Pick]Support 0D for slogdet (#53087) · 3f5058e6

由 GGBond8488 提交于 4月 20, 2023

* add 0D output support for inalg.slogdet,test=allcase

* fix zerom dime test error test=allcase

* fix test error test=allcase

* add static backward test, test=allcase

3f5058e6

19 4月, 2023 1 次提交
- Z
  [Cherry-Pick] Unique support float16&bfloat16 (#53023) · 00b7c819
  由 Zhang Zheng 提交于 4月 19, 2023
```
unique支持float16和bfloat16数据类型，并完善相关单测。
```
  00b7c819
17 4月, 2023 7 次提交

C
[Fused] controlled randomness for fused dropout add (#52903) · e36f80c6
由 Chitsing KUI 提交于 4月 17, 2023
```
* add random control for fused dropout add

* add __init__
```
e36f80c6
V
[AMP OP&Test]Add BF16 implementation and unit tests of multinomial (#52898) · d19d2486
由 Vvsmile 提交于 4月 17, 2023
```
* fix multinomial

* fix test_elementwise

* fix convert_float_to_uint16

* aadd test_multimial_op

* fix code style
```
d19d2486

【PaddlePaddle Hackathon 4 No.49】：为 Paddle bce_loss 支持 float16 数据类型 (#50930) · 44e6de98

由 thunder95 提交于 4月 17, 2023

* untracked files

* bce_loss_fp16

* remove unused files

* back max_rel_erro still big

* simplify code

* upd

* fix max_relative_error

* restart ci

* Update test_bce_loss.py

* Update test_bce_loss.py

* Update test_bce_loss.py

* Update test_bce_loss.py

* try to pass test

* restore file

* remove error value

* fix bug

---------
Co-authored-by: NZhang Ting <Douyaer2020@qq.com>

44e6de98

J
【Eager】fix multiply double grad error (#52870) · cf3ddf24
由 Jiabin Yang 提交于 4月 17, 2023
```
* fix multiply double grad error

* fix multiply dy only kenrel
```
cf3ddf24

【Hackathon No.32】为 Paddle 优化 expand_as 前向&反向 op 在 GPU 上的计算性能 (#52700) · 3c44e948

由 Hanchiao 提交于 4月 17, 2023

* Implement optimized kernel for OP-expand_as.

* Support fp16.
Co-authored-by: Timber-Ye <ye_hanqiao@163.com>
Co-authored-by: NBrianQian1999 <brianqianhitsz@gmail.com>

* remove fp16 support

* remove MAX_RANK_SUPPORTED

---------
Co-authored-by: NBrianQian1999 <brianqianhitsz@gmail.com>

3c44e948

Z

rename_SliceKernel (#52863) · d2b0d63f
由 zhangyuqin1998 提交于 4月 17, 2023

d2b0d63f

Add output defs for some kernelsPhi register (#52941) · 23f87442

由 Sonder 提交于 4月 17, 2023

* add register info for eigh and eig_gard

* add sync_batch_norm_op.cu register info

* add lamb output register info

* add unique register info

* change type name

* change type name

* add output register info for check_finite_and_unscale

* update cmake and config file

* add register info for adagrad

* fix build error

* add sync to run_unittests.sh

* add register info for unique_consecutive

* fix build error

* add eigh to STATIC_BUILD_TESTS

* update eig_kernel.cc

* update eig_kernel.cc

* fix infer mate error

* fix unique register error

* fix lamb register info error

* fix lamb register info

* update lamb register info

* fix lamb

* remove one Output Register

* update static build file

* add eigh op to disable_wingpu_test

* update run_unittests

23f87442

14 4月, 2023 10 次提交
- Z
  
  [AMP OP&Test] Cumprod support fp16 and bf16 (#52919) · 8a850af6
  由 Zhang Zheng 提交于 4月 14, 2023
  
  8a850af6
- C
  
  【Hackathon4 No58】logcumsum logsum (#51275) · 468869e4
  由 cyberslack_lee 提交于 4月 14, 2023
  
  468869e4
- C
  
  【Hackathon4 No58】kthvalue (#51615) · 43efb979
  由 cyberslack_lee 提交于 4月 14, 2023
  
  43efb979
- C
  【Hackathon No.62】digamma, dirichlet算子FP16/BF16单测完善 (#52604) · 7ecbcc08
  由 chenxujun 提交于 4月 14, 2023
```
* Add digamma, dirichlet tests

* Fix code
```
  7ecbcc08
- S
  【Hackathon No.55】add erf FP16 test and BF16 test (#52136) · eeb4d165
  由 superwinner1 提交于 4月 14, 2023
```
* add erf FP16 test
```
  eeb4d165
- C
  
  Add angle,bmm tests (#52630) · 6d7ee668
  由 chenxujun 提交于 4月 14, 2023
  
  6d7ee668
- U
  
  [Dcu]: Add rocsparse_spmm for dcu. (#52200) · 281ea2f4
  由 umiswing 提交于 4月 14, 2023
  
  281ea2f4
- Y
  [Zero-Dim] support 0-D tensor for... · 6f41e177
  由 YangQun 提交于 4月 14, 2023
```
[Zero-Dim] support 0-D tensor for reduce/reshape/stack/prelu/expand_v2/gaussion onednn kernels (#52185)

* support 0-D tensor for reduce/reshape/stack/prelu/expand_v2/gaussion ops

* fix gaussian random mkldnn op ut
```
  6f41e177
- G
  [phi] move sequence_pool to phi - Step 2 : sequence_pool_op (#52750) · b281b221
  由 gouzil 提交于 4月 14, 2023
```
* [phi] move sequence_pool kernel to phi

* [phi] mv sequence_pooling to phi funcs

* [phi] mv sequence_pooling_test

* [phi] RollBACK `paddle/fluid/operators/sequence_ops/sequence_pool_op.cc`

* [phi][funcs] fix mutable_data

* [phi][funcs] fix mutable_data
```
  b281b221
- S
  
  fix win cu116 compile error (#52894) · 60ba559a
  由 sneaxiy 提交于 4月 14, 2023
  
  60ba559a

PaddlePaddle / Paddle 大约 2 年 前同步成功

PaddlePaddle / Paddle
大约 2 年前同步成功