提交 · ddf94ae4adaad7e20cb117b962dc8901e47e0e06 · PaddlePaddle / Paddle

30 4月, 2023 1 次提交
- [Zero-Dim] Support paddle.sum/mean/loss api output 0D,test=allcase (#52739) · ddf94ae4
  由 zhouweiwei2014 提交于 4月 30, 2023
  
  ddf94ae4
28 4月, 2023 10 次提交

Dropout optimize & clean broadcast inT and ElementwiseType (#52969) · d611e48c

由 Bo Zhang 提交于 4月 28, 2023

* change judgement for DropoutGradGPUKernelDriver

* add UnrollerWithoutVecSize and after this Loaddata to be refined

* pass unittest

* use same unroller with XPU

* BroadcastWithInt64Index

* BroadcastDataLoader template partial specialization

* fix compile errs in ROCms

* clean ElementwiseT and InT for BroadcastKernel

* default axis and clean inT

* remove redundant fast divmod computation

* optimize drop_nd & drop_nd_grad

* optimize BroadcastDataLoader bf16 fp16

* rm InT etc. after merge develop

* delete constexpr for windows ci

* fix conflict

* fix conflic with develop

* fix conflic

* new clean

* clean

d611e48c

【0D output】add_0D_output_support (#52857) · ef6e8d09

由 GGBond8488 提交于 4月 28, 2023

* add 0d support for dist, trace, paddle.linalg.cond test=allcase

* add_0d_output_support_for_det

* test=allcase

* support_0d_output_for_linalg.norm

* support linalg.norm 0d output, test=allcase

* fix 0D test

* fix zero dim test, test=allcase

* fix 0D test

* fix tets,test=allcase

* fix error,test=allcase

* fix errors ,test=allcase

* add static backward , test=allcase

* add static backwward test, test=allcase

* fix pr-ci-build error;test=document_fix (#53060)

* [Cherry-Pick] Unique support float16&bfloat16 (#53023)

unique支持float16和bfloat16数据类型，并完善相关单测。

* slogdet_support_0D_output

* add new case

* fix tests, test=allcase

* fix p_norm related test, test=allcase

* fix some err, test=allcase

* test=allcase

* move out trace

* open some case, test=allcase

* fix norm all case, test=allcase

* fix some test error, test=allcase

* fix typro,test=allcase

* fix test err, test=allcase

* test=allcase

* test

* fix test error, test=allcase

* fix test error, test=allcase

* fallback norm, test=allcase

---------
Co-authored-by: Ntianshuo78520a <707759223@qq.com>
Co-authored-by: NZhang Zheng <32410583+ZzSean@users.noreply.github.com>

ef6e8d09

[Zero-Dim] Support output 0D for squeeze, unbind, unstack. (#52843) · 6adfcdf6

由 zqw_1997 提交于 4月 28, 2023

* test=allcase

* test=allcase

* test=allcase

* test=allcase

* test=allcase

* fix test cases, test=allcase

* fix test cases, test=allcase

* modify the test_squeeze to not use Tensor type axis, test=allcase

* add grad check for unbind and unstack, test=allcase

* check for squeeze axis tensor type, test=allcase

* fix bug, test=allcase

6adfcdf6

H
Support static graph code generation for op edit_distance (#53297) · 396fe483
由 huangjiyi 提交于 4月 28, 2023
```
* update

* fix bug

* support parsing fixed kernel data_type

* update op_compat

* update
```
396fe483
S

Support static graph code-gen for unpool (#52947) · 005fee12
由 Sanbu 提交于 4月 28, 2023

005fee12
【Hackathon No.52】为 Paddle dist 算子实现 float16 数据类型支持 (#50915) · 9c406531
由 iSerendipity 提交于 4月 28, 2023

9c406531
L
[XPU][BUG] Add cumsum grad kernel to xpu2 op list (#53386) · 1c1b487c
由 lj970926 提交于 4月 28, 2023
```
* clang format

* add cumsum_grad op to xpu2_op_list
```
1c1b487c
C

Add broadcast_tensors tests (#52961) · 98fc4277
由 co63oc 提交于 4月 28, 2023

98fc4277
S
【Hackathon No.55】add fmin BF16 test (#53100) · 8163faaa
由 superwinner1 提交于 4月 28, 2023
```
* 'fmin'

* 'fix'

* 'fix'
```
8163faaa

【Prim】comp_elementwise_double_grad (first part) (#53385) · 05499c71

由 xiaoguoguo626807 提交于 4月 28, 2023

* add mul doubel grad

* add sub_double_grad

* add add sub high test

* add mutiply test

* modify other unsqueeze

* delete api.yaml

* only for make ci run

* midify unsqueeze

* modify unsqueeze

* tmp

* modify operants gen

05499c71

27 4月, 2023 15 次提交

Support different dtypes of inputs for broadcast for dropout optimization (#52093) · 3474e09c

由 Bo Zhang 提交于 4月 27, 2023

* change judgement for DropoutGradGPUKernelDriver

* add UnrollerWithoutVecSize and after this Loaddata to be refined

* pass unittest

* use same unroller with XPU

* BroadcastWithInt64Index

* BroadcastDataLoader template partial specialization

* fix compile errs in ROCms

* PR comment

3474e09c

[phi] Move sequence_pool to phi - Step 3 ：sequence_pool_grad_op (#52680) · fe053396

由 gouzil 提交于 4月 27, 2023

* [phi] move sequence_pool kernel to phi

* mv kernels impl

* fix parameter error

* clean include

* fix compat filename

* [phi] move fluid sequence_pool_grad to phi

* [phi][compat] sig rm GradVarName

* [phi] fix sequence_pool out type

* [phi] rm impl, add const string

* [phi] fix const str

* fix sequence_pooling cmake

* [phi] mv sequence_pooling_test

* [phi] fix grad sig

* [phi] fix sequence_pool is_test error

* [phi] fix sequence_pooling gpu include

* [phi] mv to impl

* [phi] fix SequencePoolFunctor cu include

* [phi] modify out max_index int32_t

* [phi] add pooltype mapping determine

* [phi] fix sequence_pool_sig

* [phi] fix sequence_pool_sig sum

* [phi] try ci

* [phi] fix max_index optional

fe053396

H

[XPU] c_sync_calc_stream support more types (#53389) · 9c1eb98a
由 houj04 提交于 4月 27, 2023

9c1eb98a

[static op generation] triangular_solve (#53328) · 18968e7e

由 gouzil 提交于 4月 27, 2023

* [static op generation] triangular_solve

* [phi] mv triangular_solve_grad to static_backward

* [phi] fix import

* [phi] mv to ops.yaml、 backward.yaml

* fix forward attr

* [phi] fix triangular_solve_grad args

18968e7e

W

autogen code support for max_pool[2,3]_with_index op (#53359) · cf6cbc34
由 Wang Xin 提交于 4月 27, 2023

cf6cbc34

【PaddlePaddle Hackathon 4】：为maxout算子支持 float16 数据类型 (#50976) · 8bfd978f

由 NetPunk 提交于 4月 27, 2023

* support fp16 for maxout op

* format code

* change api

* add test for static float16

* format code

* formatting code

* atol alignment

* experiment—1

* experiment-2

* experiment-3

* format code

8bfd978f

Move fused feedforward (#53166) · 25b4ba7f

由 Sonder 提交于 4月 27, 2023

* trans fused_feedward Compute function to phi

* add register info

* remove maxfunctor

* move fused feedward to phi

* remove sig file

* remove fliud include

* add include

* add include

* add sig file

* add output register info

* fix sig file

* Update fused_feedforward_sig.cc

* fix grad kernel

* update output register info

* fix

* open fused_feedforward static build

* add optional and fix code style

* fix output info for fused attention

* add optional param

* merge

25b4ba7f

X
【prim】Concat bug (#53350) · 6768c6ec
由 xiaoguoguo626807 提交于 4月 27, 2023
```
* modify concat_grad add sum comp rule

* modify opcompat
```
6768c6ec
J

Hack__getitem__ from 0-d to 1-d with FLAGS_set_to_1d (#53358) · 1bd468e2
由 JYChen 提交于 4月 27, 2023

1bd468e2
E

fix softmax assert error (#53360) · c50f5fa4
由 engineer1109 提交于 4月 27, 2023

c50f5fa4

remove some [-Wunused-parameter] warning (#53365) · 0fac3281

由 Galaxy1458 提交于 4月 27, 2023

* test,test=develop

* test,test=develop

* test,test=develop

* test,test=develop

* test,test=develop

* test,test=develop

* test,test=develop

* test,test=develop

0fac3281

H
[XPU] remove scale_loss in parallel.py (#53337) · 2e1ac529
由 houj04 提交于 4月 27, 2023
```
* [XPU] remove scale_loss in parallel.py

* [XPU] throw Unimplemented when using Reducer
```
2e1ac529
S

【Hackathon No.55】add fmax BF16 test (#51925) · 8a6ad6e5
由 superwinner1 提交于 4月 27, 2023

8a6ad6e5
C

【Hackathon4】No5 nextafter (#52544) · 82ac3913
由 cyberslack_lee 提交于 4月 27, 2023

82ac3913

Pad grad (#53374) · bfeedd29

由 mengziheng 提交于 4月 27, 2023

* add pad op

* add_some_code

* modify some code

* add some code

* add some code

* modify some code

* add some code

* modify some code

* Update composite_backward_api.h

* modify some code

* add some code

* add some code

* add some code

bfeedd29

26 4月, 2023 10 次提交
- [Zero-Dim] distributed scatter/all_to_all support input 0D tensor (#53186) · 0b6dd535
  由 zhouweiwei2014 提交于 4月 26, 2023
  
  0b6dd535
- M
  【prim】scatter_nd_add_grad (#52469) · 55c4eb8a
  由 mhy-666 提交于 4月 26, 2023
```
* add scatter_nd_add comp

* add scatter_nd_add prim

* fix

* fix

* add public_python_api in TestScatterNdAddSimpleOp setup function

* fix composite_backward_api.h

* fix composite_backward

* add test cases

* fix composite_backward_api.h, unittest
```
  55c4eb8a
- R
  Fix fused_attention_op and fused_feedforward_op bugs in xpu (#53318) · 1164626c
  由 Ruibiao Chen 提交于 4月 26, 2023
```
* Fix fused_attention_op and fused_feedforward_op bugs in xpu

* Fix d_x alloc errors for fused_feedforward_grad_kernel
```
  1164626c
- G
  remove some [-Wunused-parameter] waring (#53319) · f9e5072b
  由 Galaxy1458 提交于 4月 26, 2023
```
* test,test=develop

* test,test=develop

* test,test=develop

* test,test=develop

* test,test=develop

* test,test=develop

* test,test=develop
```
  f9e5072b
- S
  Optimize c_embedding op in deterministic mode (#53197) · 35f5c245
  由 sneaxiy 提交于 4月 26, 2023
```
* optimize embedding deterministic mode

* fix compile error

* change FLAGS_cudnn_deterministic to int64

* fix 700 error

* add ut

* fix ut

* fix ut

* fix win32 ci

* fix flags with PHI_DEFINE_EXPORTED_int64
```
  35f5c245
- D
  
  【Hackathon No.48】为 Paddle determinant 算子实现 float16 数据类型支持 (#53286) · 2a705b74
  由 denglianbin 提交于 4月 26, 2023
  
  2a705b74
- D
  
  【Hackathon No.48】为 Paddle meshgrid 算子实现 float16 数据类型支持 (#53284) · 9127cc3c
  由 denglianbin 提交于 4月 26, 2023
  
  9127cc3c
- L
  [Bug Fixs] fix bugs when using cast<int64_t, int32_t> in xpu/cross_entropy... · 1d549400
  由 Lucas 提交于 4月 26, 2023
```
[Bug Fixs] fix bugs when using cast<int64_t, int32_t> in xpu/cross_entropy kernels, *test=kunlun (#53325)
```
  1d549400
- R
  Optimize prompt information (#53291) · 3ec12c2b
  由 risemeup1 提交于 4月 26, 2023
```
* Optimize prompt information

* add_information

* add_information
```
  3ec12c2b
- W
  
  add autogen code support for box_coder op (#53309) · ed040a17
  由 Wang Xin 提交于 4月 26, 2023
  
  ed040a17
25 4月, 2023 4 次提交
- L
  Add singlely compile gpu kernel camke function (#53305) · af986bd5
  由 lzydev 提交于 4月 25, 2023
```
* support register single .cu file

* add register GPU kernel function
```
  af986bd5
- C
  
  update tile_grad composite rule (#53261) · dda6b9d5
  由 ccrrong 提交于 4月 25, 2023
  
  dda6b9d5
- Z
  【PaddlePaddle Hackathon 4 No.33】为 Paddle 优化 Histogram op 在 GPU 上的计算性能 (#53112) · c1a61fc0
  由 Zero Rains 提交于 4月 25, 2023
```
* create KernelMinMax to optimize the performance of histogram op in GPU

* change to block and warp wise operation

* remove the time in DtoH

* fix a bug
```
  c1a61fc0
- Y
  [PHI]Add flags macro for PHI (#52991) · 22e96bde
  由 YuanRisheng 提交于 4月 25, 2023
```
* add flags for phi

* fix compile bugs

* fix ci bugs

* fix inference bugs

* fix cinn' bugs

* fix cinn bugs

* perfect code according comment

* fix ci bugs

* fix ci bugs
```
  22e96bde

PaddlePaddle / Paddle 大约 1 年 前同步成功

PaddlePaddle / Paddle
大约 1 年前同步成功