提交 · cf3ddf24cd047f4ac7a24525bc84cb231538600b · PaddlePaddle / Paddle

17 4月, 2023 13 次提交

J
【Eager】fix multiply double grad error (#52870) · cf3ddf24
由 Jiabin Yang 提交于 4月 17, 2023
```
* fix multiply double grad error

* fix multiply dy only kenrel
```
cf3ddf24

【Hackathon No.32】为 Paddle 优化 expand_as 前向&反向 op 在 GPU 上的计算性能 (#52700) · 3c44e948

由 Hanchiao 提交于 4月 17, 2023

* Implement optimized kernel for OP-expand_as.

* Support fp16.
Co-authored-by: Timber-Ye <ye_hanqiao@163.com>
Co-authored-by: NBrianQian1999 <brianqianhitsz@gmail.com>

* remove fp16 support

* remove MAX_RANK_SUPPORTED

---------
Co-authored-by: NBrianQian1999 <brianqianhitsz@gmail.com>

3c44e948

K

rem cncl keyword in py (#52939) · ea04bef8
由 Kim Yann 提交于 4月 17, 2023

ea04bef8
Z

rename_SliceKernel (#52863) · d2b0d63f
由 zhangyuqin1998 提交于 4月 17, 2023

d2b0d63f
张

remove hccl in some .cc files (#52942) · 514d83de
由张春乔提交于 4月 17, 2023

514d83de
张
remove hccl in .py files (#52934) · 27a601e8
由张春乔提交于 4月 17, 2023
```
* remove hccl in .py files

* remove ascend in setup.py.in

* remove ascend in setup.py
```
27a601e8

Add output defs for some kernelsPhi register (#52941) · 23f87442

由 Sonder 提交于 4月 17, 2023

* add register info for eigh and eig_gard

* add sync_batch_norm_op.cu register info

* add lamb output register info

* add unique register info

* change type name

* change type name

* add output register info for check_finite_and_unscale

* update cmake and config file

* add register info for adagrad

* fix build error

* add sync to run_unittests.sh

* add register info for unique_consecutive

* fix build error

* add eigh to STATIC_BUILD_TESTS

* update eig_kernel.cc

* update eig_kernel.cc

* fix infer mate error

* fix unique register error

* fix lamb register info error

* fix lamb register info

* update lamb register info

* fix lamb

* remove one Output Register

* update static build file

* add eigh op to disable_wingpu_test

* update run_unittests

23f87442

C

Fix typos, test=document_fix (#52937) · 002f2185
由 chenxujun 提交于 4月 17, 2023

002f2185
R
[CustomDevice] fix custom cpu unittests failure when multiple python exist in CI (#52923) · c3417fd2
由 ronnywang 提交于 4月 17, 2023
```
* [CustomDevice] fix custom cpu unittests failure when multiple python exist in CI

* Update test_custom_op_setup.py
```
c3417fd2
Z
[AMP OP&Test] Sync_batch_norm support bfloat16 (#52921) · 1080d4fc
由 Zhang Zheng 提交于 4月 17, 2023
```
* [AMP OP&Test] Sync_batch_norm support bfloat16

* fix

* fix
```
1080d4fc
H

[Dygraph] Support delaying div loss by accumulate_steps in PipelineLayer (#52848) · 0abdcff6
由 Haohongxiang 提交于 4月 17, 2023

0abdcff6
C
[Auto Parallel]Add o2 tune of rule based tuner (#52928) · 118a7415
由 caozhou 提交于 4月 17, 2023
```
* add o2 tune

* add unittest

* fix error

* set unittest timeout
```
118a7415
S

remove timeout (#52959) · d7659ce4
由 sneaxiy 提交于 4月 17, 2023

d7659ce4

15 4月, 2023 1 次提交
- H
  
  [Opt CustomOP] Optimize the perf and impl of custom grad operator (#52915) · 0afef498
  由 HongyuJia 提交于 4月 15, 2023
  
  0afef498
14 4月, 2023 26 次提交

Z

[AMP OP&Test] Cumprod support fp16 and bf16 (#52919) · 8a850af6
由 Zhang Zheng 提交于 4月 14, 2023

8a850af6
J
delete SupportNPU(), SupportMLU() (#52911) · 8601859e
由 jjyaoao 提交于 4月 14, 2023
```
* delete SupportNPU(), SupportMLU()

* delete npu branch
```
8601859e
C

【Hackathon4 No58】logcumsum logsum (#51275) · 468869e4
由 cyberslack_lee 提交于 4月 14, 2023

468869e4
C

【Hackathon4 No58】kthvalue (#51615) · 43efb979
由 cyberslack_lee 提交于 4月 14, 2023

43efb979
C
【Hackathon No.62】digamma, dirichlet算子FP16/BF16单测完善 (#52604) · 7ecbcc08
由 chenxujun 提交于 4月 14, 2023
```
* Add digamma, dirichlet tests

* Fix code
```
7ecbcc08
S
【Hackathon No.55】add erf FP16 test and BF16 test (#52136) · eeb4d165
由 superwinner1 提交于 4月 14, 2023
```
* add erf FP16 test
```
eeb4d165
D

update_npu_check_finite_and_unscale (#52914) · ddcc1002
由 duanyanhui 提交于 4月 14, 2023

ddcc1002
C

Add angle,bmm tests (#52630) · 6d7ee668
由 chenxujun 提交于 4月 14, 2023

6d7ee668
U

[Dcu]: Add rocsparse_spmm for dcu. (#52200) · 281ea2f4
由 umiswing 提交于 4月 14, 2023

281ea2f4

apply gcc12 to py3 ci (#52179) · 5fbcf37d

由 risemeup1 提交于 4月 14, 2023

* apply gcc12 to py3-ci

* apply gcc12 to py3-ci

* apply gcc12 to py3-ci

* test

* test

* test

* test

* make mirror

* test

* test

* test

* test debug

* test

* update cuda to 12

* update cuda to 12

* update cuda to 12

* apply gcc12 to py3

* fix gcc12 problem

* test

* apply gcc12 to py3

* test

* test

* test

* apply gcc12 to py3

5fbcf37d

[Zero-Dim] support 0-D tensor for... · 6f41e177

由 YangQun 提交于 4月 14, 2023

[Zero-Dim] support 0-D tensor for reduce/reshape/stack/prelu/expand_v2/gaussion onednn kernels (#52185)

* support 0-D tensor for reduce/reshape/stack/prelu/expand_v2/gaussion ops

* fix gaussian random mkldnn op ut

6f41e177

[Decouple enforce.h] Move LOG from enforce.h to enforce.cc (#52883) · b33f95b0

由 HongyuJia 提交于 4月 14, 2023

* [Decouple enforce.h] Move LOG from enforce.h to enforce.cc

* update cmake of device_context.cc, solve cuda_device_context_allocator.h compile error

* add namespace inside macro

b33f95b0

H

[CustomOP Unittest] Optimize unit test, save setUp time (#52889) · b66c833f
由 HongyuJia 提交于 4月 14, 2023

b66c833f

1. modify set_value op, use Scalars to represent attr `values`, instead of a... · dd2a749a

由 Feiyu Chan 提交于 4月 14, 2023

1. modify set_value op, use Scalars to represent attr `values`, instead of a bunch of attributs of various types; (#52408)

2. add program converter and set_value op as an example, which provides the functionality to convert `paddle::framework::ProgramDesc` between old and new formats(the differences are mainly some operators with incompatible updates in the definition);
3. program version and operator version map now are always saved when serializing `paddle::framework::ProgramDesc` to identify the version;
3. provide an option `legacy_format=false` in serialization of `paddle::framework::ProgramDesc`, it decided whether to convert ProgramDesc back to a legacy format, which is compatible for paddle 2.4.2 or earlier versions to load and execute;
4. deserialization of `paddle::framework::ProgramDesc` is now automatically detecting whether the bytes it receives is in legacy format(contains any of the operators that has been incompatibly updated and have any attribute of type `Scalar`) and convert it to new format. But if you want a faithful deserialization without the automatic conversion, you can use protobuf's deserialization instead. Though it is not recommended, it can be used for the purpose of testing.

dd2a749a

[phi] move sequence_pool to phi - Step 2 : sequence_pool_op (#52750) · b281b221

由 gouzil 提交于 4月 14, 2023

* [phi] move sequence_pool kernel to phi

* [phi] mv sequence_pooling to phi funcs

* [phi] mv sequence_pooling_test

* [phi] RollBACK `paddle/fluid/operators/sequence_ops/sequence_pool_op.cc`

* [phi][funcs] fix mutable_data

* [phi][funcs] fix mutable_data

b281b221

Move fused_attention op to phi [迁移反向 GPU OpKernel] (#51909) · 3bac6264

由 Sonder 提交于 4月 14, 2023

* add kernel functions

* update kernel functions

* update func parameters' name

* create codes for gpu device

* 调整文件位置

* fix include error

* remove dependent files to phi/

* restore fused_attention_op.cu

* fix dependence errors

* fix dependence errors

* fix include error

* fix all depandence errors[build success]

* remove useless include

* recover useless include

* use phi::ToNCCLDataType

* fix namespace

* update new register code

* fix error in fused_gemm_epilogue_utils

* fix error in FusedAttentionKernel parm

* finish fused_attention registe code[build success]

* add paddle::optional

* add sig file

* fix build error

* fix a include error

* 恢复正向代码

* update CMkaeList

* trans Compute function to phi [build success]

* add register code and fix include error [build success]

* fix parameter sequence

* add include file

* update #if before include

* update #if before include

* fix grammly error

* update codes for DropoutParam

* remove const cast

* trans some fluid api to phi api

* remove const cast

* trans some fluid api to phi api

* add #if

* update test code

* update test codes

* recover test codes

* fix namespace and remove fluid include

* recover random seed

* remove fluid quant_helper

* fix include error

* include utils in funcs

* change include file

* move grad codes back to fluid floder

* move grad codes back to fluid floder

* fix sig file error

* update include

* recover codes to develop

* update register codes

* fix build error

* recover fluid include

* remove some fluid include

* remove some fluid include

* Update fused_attention_op.cu

* remove fluid include

* add some fluid include

* Update fused_attention_op.cu

* Update fused_attention_op.cu

* Update fused_attention_op.cu

* Update fused_attention_op.cu

* remote useless include

3bac6264

G
fix some [-Wunused-function] and [-Wunused-function] warning (#52868) · ab163063
由 Galaxy1458 提交于 4月 14, 2023
```
* test,test=develop

* test,test=develop

* test,test=develop
```
ab163063

【Prim】Add more infer var type (#52818) · 630d14f5

由 Jiabin Yang 提交于 4月 14, 2023

* add more infer var type

* fix split error

* fix ut

* fix top_k infer vartype

* fix top_k infer vartype

630d14f5

L

add backend config to select kernel (#52907) · 1ab7e77a
由 lzydev 提交于 4月 14, 2023

1ab7e77a
S

fix win cu116 compile error (#52894) · 60ba559a
由 sneaxiy 提交于 4月 14, 2023

60ba559a
Z

delete cast if lookup_table_v2 support fp16; delete repeated ops (#52888) · 7aafeb45
由 zhupengyang 提交于 4月 14, 2023

7aafeb45
D

add npu to device_guard (#52774) · 64b4aaba
由 duanyanhui 提交于 4月 14, 2023

64b4aaba
骑
[Function optimization] support uint16 python op in d2s (#52809) · 6d231b02
由骑马小猫提交于 4月 14, 2023
```
* support uint16 python op in d2s

* convert uint16 -> bfloat16 in docstring
```
6d231b02
K

rem cncl (#52434) · 25bd5ed8
由 Kim Yann 提交于 4月 14, 2023

25bd5ed8

[AMP] Unify the static amp codes of fp16 and bf16. (#52694) · dfcba7f4

由 Yiqun Liu 提交于 4月 14, 2023

* Unify the static amp codes of fp16 and bf16.

* Polish apis and add unittest.

* Add operator stats collecting tools for program.

* Add the check of number of bloat16 operators in unittest.

* Add warning for operator not supported for amp.

* Add testing of BF16 O1 and O2.

dfcba7f4

R

[CustomDevice] add model parallel support for custom device (#52872) · f8d09011
由 ronnywang 提交于 4月 14, 2023

f8d09011

PaddlePaddle / Paddle 大约 1 年 前同步成功

PaddlePaddle / Paddle
大约 1 年前同步成功