提交 · 71e28b12815018a8420962fc6a20a3526085938d · PaddlePaddle / Paddle

31 8月, 2023 1 次提交

Add fused_scale_bias_relu_conv_bnstats OP (#55026) · 71e28b12

由 Tian Zheng 提交于 8月 31, 2023

* Add fused_scale_bias_relu_conv_bnstats op

* Review changes

* Fix no CUDNN Frontend build

* Fix PADDLE_ENFORCE format

* Fix PADDLE_ENFORCE CI error

* Rename kernel filename

* Refactor unittest to use paddle eager_op_test

* Fix padding bugs

* Review changes

* test=cuda117

* test=cuda117

71e28b12

10 8月, 2023 1 次提交

Add variable_length_memory_efficient_attention (#55400) · 4036c937

由 lzy 提交于 8月 10, 2023

* add variable_length_memory_efficient_attention
* update variable_length_memory_efficient_attention unittest
* update variable_length_mem_eff_attn's docs and unittest
* update variable_length_mem_eff_attn's docs
* Update test_variable_length_memory_efficient_attention.py
* Update variable_length_memory_efficient_attention.cu
* fix codestyle
* fix variable_length_fmha's docs and unittest
* fix variable_length_fmha's docs

4036c937

08 8月, 2023 1 次提交
- H
  
  move dgc kernel to phi (#56003) · 3c03ade8
  由 huangjiyi 提交于 8月 08, 2023
  
  3c03ade8
31 7月, 2023 1 次提交
- W
  Support stride2 (#55156) · 859fc01b
  由 wanghuancoder 提交于 7月 31, 2023
```
support stride
```
  859fc01b
01 6月, 2023 1 次提交
- Y
  
  fix xpu-kp bugs (#54234) · e8735ddf
  由 YuanRisheng 提交于 6月 01, 2023
  
  e8735ddf
26 5月, 2023 1 次提交

[PHI Decoupling]Create PHI shared lib (#53735) · da50a009

由 YuanRisheng 提交于 5月 26, 2023

* create phi so

* fix ci bugs

* fix py3 bugs

* add file

* fix py3 bugs

* fix windows bugs

* perfect so

* fix py3 bugs

* delete all static target in phi

* fix windows bugs

* fix py3 bugs

* fix ci bugs

* fix windows bugs

* fix bugs: gflags can't be linked by dynamic and static lib

* fix bugs that can not load 3rd party

* fix ci bugs

* fix compile bugs

* fix py3 bugs

* fix conflict

* fix xpu bugs

* fix mac compile bugs

* fix psgpu bugs

* fix inference failed

* deal with conflict

* fix LIBRARY_PATH bug

* fix windows bugs

* fix onednn error

* fix windows compile bugs

* fix windows compile bugs

* fix test_cuda_graph_static_mode_error aborted

* fix windows bugs

* fix mac-python3 error

* fix hip compile bugs

* change mode to static

* change to static mode

* fix ci bugs

* fix py3 bugs

* fix windows bugs

* fix bugs

* add static flag

* add PADDLE_API

* change position of PADDLE_API

* fix windows bugs

* change mode to dynamic lib

* fix windows static bugs

* deal with conflict

* fix windows unit bug

* fix coverage

* deal with conflict

* fix windows-inference

* fix py3 bugs

* fix bugs when compile type_info

* fix compile bugs

* fix py3 bugs

* fix windows bugs

* fix windows openblas

* fix xpu bugs

* fix enforce_test in windows

* update code according comment

* fix windows cmake bug

* fix windows bugs

* fix windows bugs

* delete cinn unittest

* fix cinn bugs

---------
Co-authored-by: lzydev <1528794076@qq.com>

da50a009

24 5月, 2023 1 次提交
- Z
  
  move reduce raw kernels to legacy (#53961) · f488e3fd
  由 zhangyuqin1998 提交于 5月 24, 2023
  
  f488e3fd
18 5月, 2023 1 次提交
- H
  
  move fusion_group kernel to phi (#53781) · 26da689d
  由 huangjiyi 提交于 5月 18, 2023
  
  26da689d
15 5月, 2023 1 次提交
- C
  Reduce inference library size and compile time (#53369) · 0ef51804
  由 chalsliu 提交于 5月 15, 2023
```
* Reduce inference library size and compile time

* resolve conflicts
```
  0ef51804
09 5月, 2023 1 次提交

[PHI kernels] Bind XPU kernels (#53336) · 7e9c87c5

由 RuohengMa 提交于 5月 09, 2023

* bind sparse_coo_tensor, reduce_max/max_int32, range/arange_int32, equal_bool, scatter_grad_float32, nearest_interp_int64 kernels

* add more unit tests; modify compilation logic of xpu sparse kernels

7e9c87c5

06 5月, 2023 1 次提交

move UniformRawKernel to legacy (#53158) · 13e2e10c

由 zhangyuqin1998 提交于 5月 06, 2023

* move UniformRawKernel to legacy

* Update uniform_kernel.cc

* Update uniform_kernel.cu

* Update uniform_kernel.cc

* Update uniform_kernel.cu

* Update uniform_kernel.h

* Update uniform_kernel.cc

* Empty Commit to setup deployments

13e2e10c

20 4月, 2023 1 次提交

move_elementwise_raw (#53010) · 7a72f7a2

由 zhangyuqin1998 提交于 4月 20, 2023

* setup

* Update elementwise_kernel.cc

* Update elementwise_kernel.cc

* fix

* fix

* Update elementwise_kernel.cu

* fix

* Update elementwise_kernel.cc

* Update elementwise_kernel.cc

* Update elementwise_kernel.cc

* Update elementwise_kernel.cc

* Update elementwise_kernel.cc

* Update elementwise_kernel.cc

7a72f7a2

17 4月, 2023 1 次提交

[Paddle-Inference] Add cutlass conv2d_depthwise (#51792) · bd3b096a

由 zhoutianzi666 提交于 4月 17, 2023

* initial commit for cutlass_teller

* second commit for cutlass_teller

* add conv2d_depthwise python template

* add conv2d_depthwise cutlass template

* /zhoukangkang/paddle_cutlass/Paddle/paddle/fluid/framework/ir/cutlass_teller.h

* refine code in Conv2dFusionCanSupport

* add macro in cutlass_teller.h

* add 3x3 5x5 teller

* add groups not 1 or conv2d_depthwise teller

* 只生成ic是8的倍数的conv2d_depthwise 的kernel

* add EXPLICIT in cutlass_teller.h

* final commit

* add split_k_slices in conv2d_depthwise

* make stages == 2

* 重构部分代码

* add CutlassFusionType

* solve illegal memory

* make stride_h=stride_w && make dilation==1

* must check HasAttr(use_cutlass) before GetAttrIfExists

* add CONV2D_DEPTHWISE_BIAS_SILU to OpType2String

* modify decl.h and util.cu

bd3b096a

14 4月, 2023 1 次提交

[phi] move sequence_pool to phi - Step 2 : sequence_pool_op (#52750) · b281b221

由 gouzil 提交于 4月 14, 2023

* [phi] move sequence_pool kernel to phi

* [phi] mv sequence_pooling to phi funcs

* [phi] mv sequence_pooling_test

* [phi] RollBACK `paddle/fluid/operators/sequence_ops/sequence_pool_op.cc`

* [phi][funcs] fix mutable_data

* [phi][funcs] fix mutable_data

b281b221

04 4月, 2023 1 次提交
- Y
  
  fix xpu compile bugs (#52501) · 81054ad4
  由 YuanRisheng 提交于 4月 04, 2023
  
  81054ad4
31 3月, 2023 1 次提交
- Y
  [PHI Decoupling]Remove distribute header (#52202) · e923642e
  由 YuanRisheng 提交于 3月 31, 2023
```
* remove distribute

* fix py3 bugs

* fix gpu-ps bugs

* fix compile bugs

* fix unittest bugs
```
  e923642e
29 3月, 2023 1 次提交
- S
  Fix generate_kernels.py in CUDA 12.0 (#52232) · f2c96bc2
  由 sneaxiy 提交于 3月 29, 2023
```
* fix generate_kernels.py in CUDA 12.0

* fix attrs bug
```
  f2c96bc2
24 3月, 2023 1 次提交

Memory Efficient Attention (#51867) · e5ad3859

由 ZhangDY-6483 提交于 3月 24, 2023

* first version, notest

* return final rst, notest

* use infinity() instead of max

* ut structure

* start up of ut

* generate lse

* update

* add depense

* reconstruct cmake

* move file

* add memory efficient attention and fix blasimpl

* update

* update cmake

* add namespace

* update cmake

* use .cu

* update for pad3d

* bug fix

* bug fix

* update

* bug fix

* update enforce

* add test case

* merge the lse pad

* fix kernel_fn of backward

* fix PADDLE_ENFORCE_EQ and phi_api

* fix PADDLE_ENFORCE

* fix PADDLE_ENFORCE

* rerun coverage

* fix memory efficient attention test

* rerun ci

* add cuda version condition

* add cuda version condition

* delete WIP test

* replace PADDLE_ENFORCE

* edit the namespace of datatype in multiple.cc

* rerun

* rerun

---------
Co-authored-by: Nliuyuang <liuyuang@baidu.com>

e5ad3859

22 3月, 2023 2 次提交

S

add fused dropout add (#51752) · 6ba0507d
由 ShenLiang 提交于 3月 22, 2023

6ba0507d

Add fused_linear_param_grad_add_kernel (#51805) · f59c5d8b

由 sneaxiy 提交于 3月 22, 2023

* add fused_linear_param_grad_add_kernel

* fix compile error

* remove flag

* fix ci compile error

* fix ci compile error

* revert pylayer revision

* fix ci ut

* improve performance

f59c5d8b

16 3月, 2023 1 次提交

Update from_blob API (#51646) · c07c7712

由 Huang Jiyi 提交于 3月 16, 2023

* remove contexts in tensor_utils

* update from_blob

* update from_blob

* update from_blob

* fix bug

* fix bug

c07c7712

15 3月, 2023 1 次提交
- U
  
  [cutlass] Fix make (#51718) · c665400b
  由 umiswing 提交于 3月 15, 2023
  
  c665400b
13 3月, 2023 1 次提交

[Paddle Inference ]use python to generate cutlass code (#50603) · 4e9e23cb

由 zhoutianzi666 提交于 3月 13, 2023

* use python to generate cutlass code

* refine CommonConvKernelPart1, CommonConvKernelPart2

* remove useless code in generate_cutlass_code.sh

* add more config in conv2d_residual

* CommonCutlassConvKernelPart1 and CommonCutlassConvKernelPart2

* add group conv support in util.cu

* remove .sh

* refine name

* make name goodgit status!

* add fuse_alpha

* make code easy to understand

* mot fopen generate in py

* use python script to generate conv2d,group=1 cutlass code

* use const &

* use const & && use python script to generate conv2d/group=1 code

4e9e23cb

10 3月, 2023 1 次提交

[New features]Add function node in phi_kernel for MKLDNN (#51073) · a0a6dc6a

由 HappyHeavyRain 提交于 3月 10, 2023

* Add function node in phi_kernel for MKLDNN

* fix the bug in 'BuildInferVarKernelContext'

* add infer_varkernel_utils.cc

* fix the bug:the first two parametes of 'BuildInferVarKernelContext' can't be template variable

* change the code according to first review

* change the code according to first review

* change the mode of paddle_build.sh

* change 'infer_var_kernel_fn_' to 'get_kerneltype_forvar_fn_'

* add the error information

* fix NotFound infomation warning

* fix NotFound infomation warning

* fix NotFound infomation warning

a0a6dc6a

09 3月, 2023 1 次提交

Add comm context manager, add phi broadcast op (#51072) · c191b707

由 TaoTao Li 提交于 3月 09, 2023

* * add comm context for device context

* add broadcast phi operator kernel and api

* add broadcast support dtype, update ut

* fix broadcast bfloat16 type

* fix ut

* update test_collective_broadcast_api timeout to 300

c191b707

01 3月, 2023 1 次提交

Integration flash attention (#49869) · 61611786

由 Chitsing KUI 提交于 3月 01, 2023

* flash attn

* seed

* almost

* softmax

* fix workspace

* add unitest; linux only

* fix setup

* fix datatype include

* fix setup typo

* fix def scope

* new error api

* use paddle fork

* fix attr bug; complete ut

* update flash hash

* fix rng reset

* fix offset

* fix comments

61611786

16 2月, 2023 1 次提交

[phi decoupling] remove variable.h in phi (#50407) · 905cefd4

由 Huang Jiyi 提交于 2月 16, 2023

* move variable_utils from phi_api_utils to fluid

* fix coment

* update include

* fix bugs

* fix bugs

* fix bugs

* fix bugs

* fix bugs

* update

* update

* fix CI-Windows-OpenBLAS

* fix bugs

* fix bugs

* fix bugs

* update include

* move variable_utils to phi_utils

* fix namespace

905cefd4

03 1月, 2023 1 次提交

[Paddle Inference] Implement conv2d_fusion NHWC format using cutlass (#47989) · c123dd1e

由 zhoutianzi666 提交于 1月 03, 2023

* Implement conv2d_fusion NHWC format using CUTLASS
* Add unit testing for CUTLASS Conv in inference
* Add experimental API for CUTLASS.

c123dd1e

23 12月, 2022 1 次提交
- H
  add rnn-t loss and api (#49199) · c088f9ec
  由 Hui Zhang 提交于 12月 23, 2022
```
* add warp transducer code
```
  c088f9ec
22 12月, 2022 1 次提交
- X
  
  [Paddle Inference] Add moe phi kernel (#48703) · def2a87f
  由 xiaoxiaohehe001 提交于 12月 22, 2022
  
  def2a87f
19 12月, 2022 1 次提交
- H
  [PHI decoupling] move gather_scatter_kernel from fluid to phi (#49132) · 0b79129d
  由 huangjiyi 提交于 12月 19, 2022
```
* move gather_scatter_kernel from fluid to phi

* mv gather_scatter_kernel to gather_scatter_functor
```
  0b79129d
17 12月, 2022 1 次提交
- W
  
  refactor: rename xccl files (#49127) · d4f43ad4
  由 Wen Sun 提交于 12月 17, 2022
  
  d4f43ad4
16 12月, 2022 1 次提交
- W
  
  refactor: rename files (#49117) · 40f3f4f0
  由 Wen Sun 提交于 12月 16, 2022
  
  40f3f4f0
06 12月, 2022 1 次提交

Clear extra input (Bias, ResidualData) in OpMaker of conv2d (#47579) · 0a2dfa38

由 zyfncg 提交于 12月 06, 2022

* delete Bias and ResidualData in OpMaker of conv2d

* delete extra input of conv3d

* refactor pass of conv_bias_fusion

* fix mkldnn dependency

* fix mkldnn compile

* fix test_conv_bias_mkldnn_fuse_pass

* police some code

* remove useless log

* fix analyzer_vit_ocr_tester

* fix conv_activation_mkldnn_fuse_pass

* fix test_analyzer_ocr

* add fused_conv_sig

* fix performence regression

* fix performance regression

0a2dfa38

05 12月, 2022 1 次提交
- H
  
  move device_memory_aligment from fluid to phi (#48694) · 796499fd
  由 huangjiyi 提交于 12月 05, 2022
  
  796499fd
18 11月, 2022 1 次提交

CUDNN v8 Implementation of Convolution Kernels (#47454) · 14a6e67b

由 Tian Zheng 提交于 11月 18, 2022

* Refactor conv_kernel and conv_grad_kernel to provide interface for CUDNNv8 implementation

* Fix macro

* Add implementation for conv_kernel and conv_grad_kernel

* Modification after rebase onto latest develop

* Modify plan cache to comply with the API of phi::autotune

* Refactor to reduce duplicate code

* Review fix:
- move functions in  conv_kernel_impl_v8.h and conv_grad_kernel_impl_v8.h to conv_kernel.cu and conv_grad_kernelk.cu
- add const specifier for input tensor
- add logging when plans fail to execute
- move CudnnConvBwdFilterV8 and CudnnConvBwdDataV8 to conv_cudnn_frontend.h

* - move plan building outside of cache

* Fix ROCM build

14a6e67b

31 10月, 2022 1 次提交
- R
  [CustomDevice] GetCCLComm add custom device support (#47168) · 34d13d6a
  由 ronnywang 提交于 10月 31, 2022
```
* [CustomDevice] GetCCLComm add custom device support

* update

* update

* update
```
  34d13d6a
20 10月, 2022 1 次提交
- J
  Add infer prune function (#47046) · af9486fc
  由 JingZhuangzhuang 提交于 10月 20, 2022
```
* Add infer prune function

* Update phi.cmake

* Update operators.cmake

* add fusion op
```
  af9486fc
19 9月, 2022 1 次提交
- Z
  [Sparse] Add infer meta (#46016) · 4b95f85e
  由 zhangkaihuo 提交于 9月 19, 2022
```
* sparse infer_meta
```
  4b95f85e
09 9月, 2022 1 次提交
- C
  [Phi] Add fusion kernel dir and migrate fused_softmax_mask op (#45802) · 2b4f44d5
  由 Chen Weihang 提交于 9月 09, 2022
```
* add fusion dir and fuse_softmax_mask kernel

* remove fusion kernel dir

* migrate infershape

* fix code errror
```
  2b4f44d5

PaddlePaddle / Paddle 大约 1 年 前同步成功

PaddlePaddle / Paddle
大约 1 年前同步成功