提交 · 74446b37f23622a65fab2f7b23dfa46cb062a398 · PaddlePaddle / Paddle

28 2月, 2023 12 次提交
- H
  [C++ API GetAllocator] Add C++ `GetAllocator` interface (#50813) · 74446b37
  由 HongyuJia 提交于 2月 28, 2023
```
* [C++ API GetAllocator] Add C++ GetAllocator interface

* move api to accurate directory
```
  74446b37
- G
  add cumsum prim backward (#50565) · ca2b6095
  由 GGBond8488 提交于 2月 28, 2023
```
* add cumsum prim backward

* skip aixs=None test case

* fix op generante eror

* fix static test error

* remove unused code

* fix static test error

* skip cpu float16 test case

* skip eager cpu cumsum float16 test case

* add cinn test

* reshape flatten out

* Disable cinn single test

* remove cinn test

* reformat todo

* add prim in cumsum op test

* remove old test

* fix typro

* fix typro

* fix typro

* pass axis=None test case

* remove forward prim test

* remove same name axis
```
  ca2b6095
- Z
  
  [XPU] support convert fp16 model (#50790) · f265a313
  由 zhupengyang 提交于 2月 28, 2023
  
  f265a313
- S
  
  xpu gaussian_random support fp16 (#50881) · 569b018e
  由 shentanyue 提交于 2月 28, 2023
  
  569b018e
- Y
  
  fix bug in fused_gemm_epilogue_op.cc (#50980) · 064a5434
  由 yuehuayingxueluo 提交于 2月 28, 2023
  
  064a5434
- 张
  [fp16] suppot fp16 on nn.Dropout2D (#50904) · bf05168c
  由张春乔提交于 2月 28, 2023
```
* add unittest for nn.DropOut2D

* add fp16

* add fp16 in docs of temporal_shift_op.cc

* Update test_dropout_op.py
```
  bf05168c
- Z
  forbid tensorrt_engine op's output is a persistable var (#50932) · bbf2bc2b
  由 zhoutianzi666 提交于 2月 28, 2023
```
* forbid tensorrt_engine op's output is a persistable var
```
  bbf2bc2b
- T
  
  xpu-paddlepaddle-57 [任务] adamw lr_radio支持 (#50979) · dda74715
  由 taixiurong 提交于 2月 28, 2023
  
  dda74715
- Y
  
  fix gflags from environment not activated (#50864) · a8fff38f
  由 Yuanle Liu 提交于 2月 28, 2023
  
  a8fff38f
- W
  fix concat axis bug (#50951) · 75a2f9d5
  由 wenbin 提交于 2月 28, 2023
```
* fix concat bug

* recommit for ci
```
  75a2f9d5
- N
  
  Count the number of 0 in the output Tensor (#50981) · 6c471ed0
  由 niuliling123 提交于 2月 28, 2023
  
  6c471ed0
- J
  【Prim】Reshape, transpose, cast vjp (#50778) · ab1b6303
  由 Jiabin Yang 提交于 2月 28, 2023
```
* support transpose and reshape

* support reshpe, transpose, cast vjp

* merge develop

* recover unused file

* remove prim base

* support problem

* remove additional status settting

* remove additional status settting

* fix ut

* fix ut

* fix ut

* fix no grad branch

* add more test

* disable fp16 in cpu

* fix test
```
  ab1b6303
27 2月, 2023 18 次提交

J

[CINN] fix cinn cache key should save var name bug (#50955) · f78b4079
由 jiangcheng 提交于 2月 27, 2023

f78b4079

Add inferface of get registered phi kernels (#50814) · 0f8c304a

由 zyfncg 提交于 2月 27, 2023

* add inferface of get registered phi kernels

* change KernelType to KernelKey

* add test

* refactor code

0f8c304a

[XPU] add fp16 support for shape and lookup_table_v2 op. (#50773) · d2a0577a

由 houj04 提交于 2月 27, 2023

* [XPU] add fp16 support for shape op.

* [XPU] add fp16 support for lookup_table_v2 op.

* update approval list: add qingshu's id.

d2a0577a

Z

handle trt engine deserialization failure and rebuild (#50775) · 377cbcea
由 Zhang Jun 提交于 2月 27, 2023

377cbcea

张

【Hackathon No.68】Remove utils in phi (#50833) · 6c181d1d

由张春乔提交于 2月 27, 2023

* remove utils

* remove utils

* remove utils

* remove utils

* Update get_data_from_tensor.h

* Update rnn_functor.h

* Update rnn_grad_kernel.cu.cc

* Update rnn_kernel.cu.cc

* Update rnn_kernel.cc

* Update rnn_grad_kernel.cu.cc

* Update rnn_functor.h

* Update rnn_kernel.cu.cc

* Update rnn_kernel.cc

* remove utils

* Update rnn_functor.h

* remove utils

* remove utils

* remove utils

* remove utils

* remove utils

* Update rnn_functor.h

* Update unsqueeze_op.h

* Update utils.h

* roll back

* Update tensor_utils.h

* Update tensor_utils.h

* Update tensor_utils.h

* Update tensor_utils.h

* Update tensor_utils.h

* use TensorToVector

* use TensorToVector

* use TensorToVector

* use TensorToVector

* use TensorToVector

* Update rnn_kernel.cc

* Update rnn_grad_kernel.cc

* Update rnn_functor.h

* Update rnn_grad_kernel.cu.cc

* Update rnn_kernel.cu.cc

* Update rnn_functor.h

* Update rnn_grad_kernel.cu.cc

* Update rnn_kernel.cu.cc

* Update rnn_functor.h

* Update rnn_grad_kernel.cu.cc

* Update rnn_kernel.cu.cc

* add TensorToVector

* roll back

* Update tensor_utils.h

* Update rnn_functor.h

* Update rnn_grad_kernel.cu.cc

* Update tensor_utils.h

* Update rnn_kernel.cu.cc

* Update rnn_grad_kernel.cc

* Update rnn_kernel.cc

* Update rnn_grad_kernel.cu.cc

* Update rnn_kernel.cu.cc

* Update rnn_grad_kernel.cc

* Update rnn_kernel.cc

* TensorCopySync to phi::Copy

* fix codestyle

* rnn_kernel.cc: add ;

* replace all GetDataFromTensor with phi::GetVectorFromTensor

* delete include of util.h

6c181d1d

W
[TRT] Add sm version check for TensorRT flash attention and cross attention pass/plugin (#50830) · 38dad3b9
由 Wang Bojun 提交于 2月 27, 2023
```
* add sm version check

* use GetGPUComputeCapability
```
38dad3b9
H
[Tensor Operants & Prim] Tensor pow API uses elementwise_pow (#50886) · 8a097399
由 HongyuJia 提交于 2月 27, 2023
```
* [Tensor Operants & Prim] Tensor pow API uses elementwise_pow

* unittest change to fill_constant+elementwise_pow
```
8a097399
H
[Error Msg] Polish error message when GPU kernel not found (#50880) · 3e9ffaef
由 HongyuJia 提交于 2月 27, 2023
```
* [Error Msg] Polish error message when GPU kernel not found

* Only test in GPU environment
```
3e9ffaef
B
Reduce redundant cpu computation in slice compute (#50348) · 8aec0580
由 Bo Zhang 提交于 2月 27, 2023
```
* conflict

* add UpdateSliceAttrs
```
8aec0580
G

change message info (#50546) · 097402d9
由 gaoziyuan 提交于 2月 27, 2023

097402d9
C

revert operator.cc (#50895) · ec814cf5
由 csy0225 提交于 2月 27, 2023

ec814cf5
J
[kunlun] support reduce_scatter (#50792) · 6786c012
由 jameszhang 提交于 2月 27, 2023
```
* [kunlun] support reduce_scatter

* uncomment unittest

* update xccl to 1.0.10
```
6786c012
Y

Add PADDLE_THROW in ToCudaDataType and polish codes. (#50922) · 2eeaaa7d
由 Yiqun Liu 提交于 2月 27, 2023

2eeaaa7d
revert reshape 0 represent copy and support perm < 0 for paddle.transpose (#50720) · 3669868d
由 zhouweiwei2014 提交于 2月 27, 2023

3669868d

[IR] Type system stage2: add class Type, type uniquer utils, class IRContext (#50412) · a5827f0e

由 zhangbo9674 提交于 2月 27, 2023

* add TypeUniquer and IrContext

* refine include code

* add Type, TypeBase

* add built-in type

* add bulit-in Float32Type

* refine ut

* refine code

* refine code

* delete type_base

* rename ImplType to StorageType

* rename ImplType to StorageType

* add macros util for register type

* add macros util for register type

* refine name

* refine name

* change storage manager

* add multi_thread for ir_ctx

* rwlock_2_spinlock, add REGISTER_TYPE_2_IRCONTEXT

* DECLARE_TYPE_UTILITY_FUNCTOR

* refine ircontext singleton

* del destructor for ParametricStorageManager

* refine code

* Add necessary logs for debugging

* refine ir_context instance

* refine type get interface

* refine code by comment

a5827f0e

W
xpu: bind op scatter_nd_add. add data type for transpose2, clip & assign_value (#50825) · 0d12afea
由 wangshengxiang 提交于 2月 27, 2023
```
* [XPU] bind op scatter_nd_add

* [XPU] add more data type for op: clip, transpose2 & assign_value
```
0d12afea

[Bfloat16]register bfloat16 datatype for squared l2 norm (#50908) · 3c121040

由 shaojie_wang 提交于 2月 26, 2023

* register bfloat16 datatype for squared l2 norm

* register bfloat16 datatype for softmax with upper triangular mask

* register bfloat16 for tril triu cuda kernel

3c121040

[mv fleet] mv fleet to distributed (#50834) · 5d322ced

由 wangzhen38 提交于 2月 27, 2023

* [mv fleet] mv fleet to distributed

* [mv fleet] for ci

* [mv fleet] for ci

* [mv fleet] solve ci of version

5d322ced

26 2月, 2023 2 次提交

Matmul performance optimization with cuBlasLt (#46431) · d4217fc6

由 limingshu 提交于 2月 26, 2023


* implement of matmul using cublasLt instead of cublas

* Update matmul_kernel_impl_via_blasLt.h

---------
Co-authored-by: Nzhangbopd <1299246947@qq.com>
Co-authored-by: NBo Zhang <105368690+zhangbopd@users.noreply.github.com>
Co-authored-by: NLiu Yiqun <liuyiqun01@baidu.com>

d4217fc6

Enable matmul + bias fusion in fused_gat_attention. (#50755) · 57f6a469

由 Yiqun Liu 提交于 2月 26, 2023

* Enable matmul + bias fusion in fused_gat_attention.

* Add a variable to control whether using fused matmul + bias.

57f6a469

25 2月, 2023 1 次提交
- Z
  Rename elementwise_heaviside to heaviside (#50821) · 8129c22e
  由 zyfncg 提交于 2月 25, 2023
```
* rename elementwise_heaviside to heaviside

* delete __init__.py

* fix bug
```
  8129c22e
24 2月, 2023 7 次提交
- Y
  
  [Zero-Dim] Support 0D Tensor input for topk/broadcast_to/expand/expand_as/broadcast_shape (#50536) · 5041158f
  由 yunyaoXYY 提交于 2月 24, 2023
  
  5041158f
- Z
  [Paddle-TRT] allow plugin fall back to fp16 when int8 (#50554) · f24eadd9
  由 zhoutianzi666 提交于 2月 24, 2023
```
* allow fall back to fp16 when int8

* refine code

* refine code

* refine code
```
  f24eadd9
- S
  Fused ops converter (#50751) · 9429936c
  由 Sławomir Siwek 提交于 2月 24, 2023
```
* ConvertToFusedOp

* change static to inline
Co-authored-by: NTomasz Socha <tomasz.socha@intel.com>

---------
Co-authored-by: NTomasz Socha <tomasz.socha@intel.com>
```
  9429936c
- N
  
  Fix KP operator Kernel selection error (#50178) · 6ef3f2ce
  由 niuliling123 提交于 2月 24, 2023
  
  6ef3f2ce
- J
  【Prim】Fix prim amp (#50518) · 6664a232
  由 Jiabin Yang 提交于 2月 24, 2023
```
* change amp with to_prim

* fix prim amp

* fix rules

* fix liear

* add amp test

* add test

* disable this test on cpu

* disable this test on cpu

---------
Co-authored-by: Ncyber-pioneer <chenzhuo@tju.edu.cn>
```
  6664a232
- C
  
  fix composite grad maker code gen (#50854) · 07c416c8
  由 Charles-hit 提交于 2月 24, 2023
  
  07c416c8
- Y
  
  Fix libpaddle_inference.so symbol conflicts with other .so (gflags) (#50787) · 041ea14c
  由 Yuanle Liu 提交于 2月 24, 2023
  
  041ea14c

PaddlePaddle / Paddle 大约 1 年 前同步成功

PaddlePaddle / Paddle
大约 1 年前同步成功