提交 · f78b4079b40bb85b083ab2ef853c87baab7c3f95 · PaddlePaddle / Paddle

27 2月, 2023 36 次提交

J

[CINN] fix cinn cache key should save var name bug (#50955) · f78b4079
由 jiangcheng 提交于 2月 27, 2023

f78b4079

Add inferface of get registered phi kernels (#50814) · 0f8c304a

由 zyfncg 提交于 2月 27, 2023

* add inferface of get registered phi kernels

* change KernelType to KernelKey

* add test

* refactor code

0f8c304a

[XPU] add fp16 support for shape and lookup_table_v2 op. (#50773) · d2a0577a

由 houj04 提交于 2月 27, 2023

* [XPU] add fp16 support for shape op.

* [XPU] add fp16 support for lookup_table_v2 op.

* update approval list: add qingshu's id.

d2a0577a

Z

handle trt engine deserialization failure and rebuild (#50775) · 377cbcea
由 Zhang Jun 提交于 2月 27, 2023

377cbcea

张

【Hackathon No.68】Remove utils in phi (#50833) · 6c181d1d

由张春乔提交于 2月 27, 2023

* remove utils

* remove utils

* remove utils

* remove utils

* Update get_data_from_tensor.h

* Update rnn_functor.h

* Update rnn_grad_kernel.cu.cc

* Update rnn_kernel.cu.cc

* Update rnn_kernel.cc

* Update rnn_grad_kernel.cu.cc

* Update rnn_functor.h

* Update rnn_kernel.cu.cc

* Update rnn_kernel.cc

* remove utils

* Update rnn_functor.h

* remove utils

* remove utils

* remove utils

* remove utils

* remove utils

* Update rnn_functor.h

* Update unsqueeze_op.h

* Update utils.h

* roll back

* Update tensor_utils.h

* Update tensor_utils.h

* Update tensor_utils.h

* Update tensor_utils.h

* Update tensor_utils.h

* use TensorToVector

* use TensorToVector

* use TensorToVector

* use TensorToVector

* use TensorToVector

* Update rnn_kernel.cc

* Update rnn_grad_kernel.cc

* Update rnn_functor.h

* Update rnn_grad_kernel.cu.cc

* Update rnn_kernel.cu.cc

* Update rnn_functor.h

* Update rnn_grad_kernel.cu.cc

* Update rnn_kernel.cu.cc

* Update rnn_functor.h

* Update rnn_grad_kernel.cu.cc

* Update rnn_kernel.cu.cc

* add TensorToVector

* roll back

* Update tensor_utils.h

* Update rnn_functor.h

* Update rnn_grad_kernel.cu.cc

* Update tensor_utils.h

* Update rnn_kernel.cu.cc

* Update rnn_grad_kernel.cc

* Update rnn_kernel.cc

* Update rnn_grad_kernel.cu.cc

* Update rnn_kernel.cu.cc

* Update rnn_grad_kernel.cc

* Update rnn_kernel.cc

* TensorCopySync to phi::Copy

* fix codestyle

* rnn_kernel.cc: add ;

* replace all GetDataFromTensor with phi::GetVectorFromTensor

* delete include of util.h

6c181d1d

C

Fix typos (#50894) · b8c06b6a
由 chenxujun 提交于 2月 27, 2023

b8c06b6a
M
[Bug fix] Fix fp16 dtype checking for AvgPool1D op (#50929) · f8ec430e
由 Maple Xie 提交于 2月 27, 2023
```
* Fix fp16 dtype checking for AvgPool1D op

* Update code style for PR-CI-Static-Check
```
f8ec430e
W
[TRT] Add sm version check for TensorRT flash attention and cross attention pass/plugin (#50830) · 38dad3b9
由 Wang Bojun 提交于 2月 27, 2023
```
* add sm version check

* use GetGPUComputeCapability
```
38dad3b9
张

support fp16 on temporal_shift (#50919) · 12075f2a
由张春乔提交于 2月 27, 2023

12075f2a
H
[Tensor Operants & Prim] Tensor pow API uses elementwise_pow (#50886) · 8a097399
由 HongyuJia 提交于 2月 27, 2023
```
* [Tensor Operants & Prim] Tensor pow API uses elementwise_pow

* unittest change to fill_constant+elementwise_pow
```
8a097399
张
[fp16] support fp16 on AvgPool3D (#50920) · 659cede0
由张春乔提交于 2月 27, 2023
```
* support fp16 on AvgPool3D

* Apply suggestions from code review
```
659cede0
张

support fp16 on AlphaDropout (#50917) · 3678cae2
由张春乔提交于 2月 27, 2023

3678cae2
张

support fp16 on unbind (#50916) · 5f60b597
由张春乔提交于 2月 27, 2023

5f60b597
张

suppot fp16 in gather_nd (#50909) · 336cd205
由张春乔提交于 2月 27, 2023

336cd205
张

suppot fp16 in flatten (#50906) · 7ffbf7e3
由张春乔提交于 2月 27, 2023

7ffbf7e3
张

suppot fp16 in broadcast (#50905) · 77298931
由张春乔提交于 2月 27, 2023

77298931

fix fp16 dtype checking for clip op (#50878) · d832a54d

由 haozi 提交于 2月 27, 2023

* fix fp16 dtype checking for clip op

* modify the name

* fix type error

* fix check error

* Update test_clip_op.py

fix test error

* Update test_clip_op.py

fix code style

---------
Co-authored-by: NZhang Ting <Douyaer2020@qq.com>

d832a54d

I

fix fp16 dtype checking for conj op (#50868) · 6b85eb59
由 Infinity_lee 提交于 2月 27, 2023

6b85eb59
H
[Error Msg] Polish error message when GPU kernel not found (#50880) · 3e9ffaef
由 HongyuJia 提交于 2月 27, 2023
```
* [Error Msg] Polish error message when GPU kernel not found

* Only test in GPU environment
```
3e9ffaef
Z
[bug fix] fix fp16 dtype checking for argmax op (#50811) · f3aec871
由 Zhang Ting 提交于 2月 27, 2023
```
* fix fp16 dtype checking for argmax op

* run fp16 test when place is gpu

* Update search.py

fix doc
```
f3aec871
A

[fp16] fix fp16 support for nn.PairwiseDistance (#50849) · 587120ec
由 Ainavo 提交于 2月 27, 2023

587120ec
陈

fix fp16 dtype checking for paddle.diag API (#50848) · ebea0885
由陈沧夜提交于 2月 27, 2023

ebea0885

张

[fp16] suppot fp16 input in nansum (#50847) · 9951b86f

由张春乔提交于 2月 27, 2023

* add float16 in python/paddle/math

* add unittest for float16

* add float16 support in python.paddle.tensor.search.where

* remove fp16 error cases

* Add NotImplementedError unittest

* fix codestyle

* fluid to paddle.static; add cases with GPU

* Add float16 in English docs

9951b86f

B
Reduce redundant cpu computation in slice compute (#50348) · 8aec0580
由 Bo Zhang 提交于 2月 27, 2023
```
* conflict

* add UpdateSliceAttrs
```
8aec0580
G

change message info (#50546) · 097402d9
由 gaoziyuan 提交于 2月 27, 2023

097402d9
C

revert operator.cc (#50895) · ec814cf5
由 csy0225 提交于 2月 27, 2023

ec814cf5
C

add prim test for sqrt and exp (#50942) · cf209204
由 Charles-hit 提交于 2月 27, 2023

cf209204
J
[kunlun] support reduce_scatter (#50792) · 6786c012
由 jameszhang 提交于 2月 27, 2023
```
* [kunlun] support reduce_scatter

* uncomment unittest

* update xccl to 1.0.10
```
6786c012
Y

Add PADDLE_THROW in ToCudaDataType and polish codes. (#50922) · 2eeaaa7d
由 Yiqun Liu 提交于 2月 27, 2023

2eeaaa7d
revert reshape 0 represent copy and support perm < 0 for paddle.transpose (#50720) · 3669868d
由 zhouweiwei2014 提交于 2月 27, 2023

3669868d

[IR] Type system stage2: add class Type, type uniquer utils, class IRContext (#50412) · a5827f0e

由 zhangbo9674 提交于 2月 27, 2023

* add TypeUniquer and IrContext

* refine include code

* add Type, TypeBase

* add built-in type

* add bulit-in Float32Type

* refine ut

* refine code

* refine code

* delete type_base

* rename ImplType to StorageType

* rename ImplType to StorageType

* add macros util for register type

* add macros util for register type

* refine name

* refine name

* change storage manager

* add multi_thread for ir_ctx

* rwlock_2_spinlock, add REGISTER_TYPE_2_IRCONTEXT

* DECLARE_TYPE_UTILITY_FUNCTOR

* refine ircontext singleton

* del destructor for ParametricStorageManager

* refine code

* Add necessary logs for debugging

* refine ir_context instance

* refine type get interface

* refine code by comment

a5827f0e

W
xpu: bind op scatter_nd_add. add data type for transpose2, clip & assign_value (#50825) · 0d12afea
由 wangshengxiang 提交于 2月 27, 2023
```
* [XPU] bind op scatter_nd_add

* [XPU] add more data type for op: clip, transpose2 & assign_value
```
0d12afea
Z
[AutoParallel] add dist_attr in data_parallel optimization (#49744) · a36cdd6b
由 zhaoyingli 提交于 2月 27, 2023
```
* fix dist_attr in data_parallel in optimization

* fix grad_clip pass when pp2

* fix dist_attr
```
a36cdd6b

[Bfloat16]register bfloat16 datatype for squared l2 norm (#50908) · 3c121040

由 shaojie_wang 提交于 2月 26, 2023

* register bfloat16 datatype for squared l2 norm

* register bfloat16 datatype for softmax with upper triangular mask

* register bfloat16 for tril triu cuda kernel

3c121040

[mv fleet] mv fleet to distributed (#50834) · 5d322ced

由 wangzhen38 提交于 2月 27, 2023

* [mv fleet] mv fleet to distributed

* [mv fleet] for ci

* [mv fleet] for ci

* [mv fleet] solve ci of version

5d322ced

Z
[AutoParallel] fix set_grad_var_shape (#50722) · 76c495d7
由 zhaoyingli 提交于 2月 27, 2023
```
* fix set_grad_var_shape

* recover modify
```
76c495d7

26 2月, 2023 2 次提交

Matmul performance optimization with cuBlasLt (#46431) · d4217fc6

由 limingshu 提交于 2月 26, 2023


* implement of matmul using cublasLt instead of cublas

* Update matmul_kernel_impl_via_blasLt.h

---------
Co-authored-by: Nzhangbopd <1299246947@qq.com>
Co-authored-by: NBo Zhang <105368690+zhangbopd@users.noreply.github.com>
Co-authored-by: NLiu Yiqun <liuyiqun01@baidu.com>

d4217fc6

Enable matmul + bias fusion in fused_gat_attention. (#50755) · 57f6a469

由 Yiqun Liu 提交于 2月 26, 2023

* Enable matmul + bias fusion in fused_gat_attention.

* Add a variable to control whether using fused matmul + bias.

57f6a469

25 2月, 2023 2 次提交

Support 0D for equal tensor with scalar (#50857) · 7c73910e
由 zhouweiwei2014 提交于 2月 25, 2023

7c73910e

change outputs and grads from fp16-fp16-comparision and fp16-fp32 (#50700) · 2dec64d0

由 Vvsmile 提交于 2月 25, 2023

* change outputs and grads from fp16-fp16-comparision and fp16-fp32
comparision

* support grad comparision fp16-fp32

* the change of reference dtype only occured from np.float16 to np.float32

* fix the list type can not infer the dtype by attribute dtype by transfer
the list to array

* adjust the default atol and rtol of float16 to 1e-3

* Polish code

* fix error

* fix

* Polish code

* fix the _is_cal_ref and np.float16

* fix the combination of is_calc_ref and np.float16

* remove unuseful codes in op_test.py

* fix ci

* fix the rtol set in the dygraph checker and eager checker

---------
Co-authored-by: NZzSean <18818272991@163.com>

2dec64d0

PaddlePaddle / Paddle 1 年多 前同步成功

PaddlePaddle / Paddle
1 年多前同步成功