提交 · ec857b850dd2f019ab3e658a920a878b8ca53630 · BaiXuePrincess / Paddle

05 1月, 2023 1 次提交
- Y
  
  Add transpose_qkv_wb flags to the fused_attention_op. (#49494) · ec857b85
  由 Yuang Liu 提交于 1月 05, 2023
  
  ec857b85
04 1月, 2023 3 次提交

W

[Inference] Add conv_fusion nhwc impl. (#49047) · 4a8708bb
由 Wilber 提交于 1月 04, 2023

4a8708bb
Y

[Paddle Inference] fix mixed precision diff (#49475) · ac75a9a6
由 Yuanle Liu 提交于 1月 04, 2023

ac75a9a6

[Unify KernelKey] change OpKernelType->KernelKey (#49138) · 4383494f

由 HongyuJia 提交于 1月 04, 2023

* execute use kernel_key first

* change OpKernelType->KernelKey

* fix py3 compile error, remove redundant header files

* fix build_strategy_test

* fix DataType::RAW

* fix custom_type test: operator_test.cc

* fix transform place

* fix backends_are_same_class

* try fix place TransDataDevice

* support all KernelKey

* fix TransformData

* fix place_are_same_class

* fix merge

* fix test_params_no_grad

* fix specific place of GetExpectedKernelType

* fix specific place of GetExpectedKernelType

* fix GetKernelTypeForVar

* fix dtype error

* fix fetch_v2

* change GetKernelTypeForVar

* fix interpreter

* fix typo error

* polish codes

* polish codes

* polish codes

* fix conflict

4383494f

03 1月, 2023 1 次提交

[Paddle Inference] Implement conv2d_fusion NHWC format using cutlass (#47989) · c123dd1e

由 zhoutianzi666 提交于 1月 03, 2023

* Implement conv2d_fusion NHWC format using CUTLASS
* Add unit testing for CUTLASS Conv in inference
* Add experimental API for CUTLASS.

c123dd1e

29 12月, 2022 2 次提交
- fix ambiguous symbol error (#49406) · 6f07960c
  由 MarDino 提交于 12月 29, 2022
  
  6f07960c
- W
  fused_attention_op paratmers stop grad support (#49351) · 0bb999b6
  由 Wang Bojun 提交于 12月 29, 2022
```
* fusedAttenGrad_noGrad

* code style fix

* add ut

* remove unnecessary log
```
  0bb999b6
23 12月, 2022 1 次提交
- L
  
  make FusedMultiTransformer supports RoPE (#48842) · 644dfc60
  由 lzy 提交于 12月 23, 2022
  
  644dfc60
20 12月, 2022 1 次提交

[PHI decouple] move dropout_impl and cuda_graph_with_memory_pool from fluid to phi (#49139) · 579784e2

由 huangjiyi 提交于 12月 20, 2022

* move dropout_impl from fluid to phi

* move cuda_graph_with_memory_pool from fluid to phi

* update namespace

* remove cuad_graph in fluid

* fix mac-build

* fix bugs

* correct CodeStyle

* fix mac-build

* fix mutable_data

* fix stl include

* fix copy param

579784e2

19 12月, 2022 1 次提交
- W
  
  refactor: rename process group (#49137) · 22e416cf
  由 Wen Sun 提交于 12月 19, 2022
  
  22e416cf
16 12月, 2022 1 次提交
- W
  
  refactor: rename files (#49117) · 40f3f4f0
  由 Wen Sun 提交于 12月 16, 2022
  
  40f3f4f0
15 12月, 2022 2 次提交

H

[PHI decoupling] move softmax from fluid to phi and remove cpu_vec.h in fluid (#48970) · 344b99e1
由 huangjiyi 提交于 12月 15, 2022

344b99e1

[PHI decoupling] Remove fluid imports from MKLDNN code (#48981) · 4d5a5533

由 Sławomir Siwek 提交于 12月 15, 2022

* fix wrong handler name

* mkldnn_engine -> onednn_engine

* remove fluid/errors.h imports

* remove fluid/enforce.h imports

* remove note and unnecessary import

* remove fluid/pretty_log.h imports

* remove fluid/place.h imports

* remove fluid/data_layout_transform.h imports

* remove fluid/device_context.h imports

* remove mkldnn_helper code

* remove fluid/mkldnn_reuse.h imports

* pretty_log import

4d5a5533

14 12月, 2022 2 次提交
- M
  
  Fix nullptr to TestFuseGemmEpilogueReluBWDFP* (#48997) · e61df289
  由 Ming-Xu Huang 提交于 12月 14, 2022
  
  e61df289
- Z
  modify cmake file for cuda11.8 compile (#49020) · d0284f85
  由 zqw_1997 提交于 12月 14, 2022
```
* modify cmake file for cuda11.8 compile

* add op_library(fused_embedding_eltwise_layernorm_op DEPS bert_encoder_functor)
```
  d0284f85
13 12月, 2022 1 次提交

Save fused_attention op memory when dropout_rate = 0.0 (#48902) · 428fb804

由 sneaxiy 提交于 12月 13, 2022

* save fused_attention memory when dropout_rate = 0.0

* add ut

* fix ut bug

* fix fused_layernorm_residual_dropout_bias_test.cu

428fb804

12 12月, 2022 1 次提交

[PHI decoupling] move norm_utils.cu.h from fluid to phi and remove norm_utils.h in fluid (#48930) · 3cb8db8f

由 huangjiyi 提交于 12月 12, 2022

* move norm_utils.cu.h from fluid to phi

* remove norm_utils.h in fluid

* fix bugs and replace mutable_data with Alloc

* replace mutable_data with Alloc

3cb8db8f

09 12月, 2022 2 次提交
- fix scale type in alpha and beta (#48887) · c1cadcca
  由 MarDino 提交于 12月 09, 2022
  
  c1cadcca
- P
  
  [PHI decoupling] move "flags.h" from fluid to phi (#48696) · 39ffef0d
  由 PuQing 提交于 12月 09, 2022
  
  39ffef0d
08 12月, 2022 1 次提交
- L
  
  first commit (#38143) · 2e7c172c
  由 limingshu 提交于 12月 08, 2022
  
  2e7c172c
07 12月, 2022 1 次提交
- 张
  
  [phi::DenseTensor] Replace Tensor with phi::DenseTensor (#48682) · 65420271
  由张春乔提交于 12月 07, 2022
  
  65420271
06 12月, 2022 1 次提交

Clear extra input (Bias, ResidualData) in OpMaker of conv2d (#47579) · 0a2dfa38

由 zyfncg 提交于 12月 06, 2022

* delete Bias and ResidualData in OpMaker of conv2d

* delete extra input of conv3d

* refactor pass of conv_bias_fusion

* fix mkldnn dependency

* fix mkldnn compile

* fix test_conv_bias_mkldnn_fuse_pass

* police some code

* remove useless log

* fix analyzer_vit_ocr_tester

* fix conv_activation_mkldnn_fuse_pass

* fix test_analyzer_ocr

* add fused_conv_sig

* fix performence regression

* fix performance regression

0a2dfa38

05 12月, 2022 2 次提交
- L
  Transpose optimization for AlphaFold2 (#45230) · a0f43889
  由 limingshu 提交于 12月 05, 2022
```
* first commit

* fix bugs according to ci

* add some changes

* change file name into function.cu.h

* remove const_cast
```
  a0f43889
- Z
  
  support nhwc in conv2d_fusion (#48642) · 30f4ef7f
  由 zhoutianzi666 提交于 12月 05, 2022
  
  30f4ef7f
01 12月, 2022 1 次提交
- M
  fuse-mt passes compatible with structured pruning (#48585) · a365024c
  由 minghaoBD 提交于 12月 01, 2022
```
* fuse-mt passes compatible with structured pruning
```
  a365024c
30 11月, 2022 4 次提交

[PHI decoupling] migrate transpose_op.cu.h and gpu_utils.h to phi (#48286) · 8a9bef70

由 Netpunk 提交于 11月 30, 2022

* migrate transpose_op.cu.h and gpu_utils.h

* format code style

* fix some problems

* format code

* reset tranpose_op.cc

* test commit

* recover transpose_op.h

* delete transpose_op.h

* adjust header files order in transpose_op.cc

8a9bef70

Support more activation in fused multi transformer (#48371) · 8a717a3e
由 MarDino 提交于 11月 30, 2022
```
* add activation support
* fix cublasLt bug
* remove useless code and fix test random range
```
8a717a3e
Z
Add fuse_act_add_grad_pass (#48346) · ca552933
由 zhangbo9674 提交于 11月 30, 2022
```
* add fuse act add grad pass

* polish code

* refine code

* add test

* refine code
```
ca552933
R
Add int8 support in fused_multi_transformer_pass and fuse_multi_transformer_layer_pass (#48209) · 12486712
由 RichardWooSJTU 提交于 11月 30, 2022
```
* delete unnecessary shape and slice op
Co-authored-by: NYour Name <you@example.com>
```
12486712

29 11月, 2022 2 次提交

fix mma_tensorcore (#48386) · bf4d1792

由 lzy 提交于 11月 29, 2022

* fix mma_tensorcore (__CUDA_ARCH__)

* disable tensorcore by default.

disable tensorcore by default, because the judgment of __CUDA_ARCH__ will cause undefined behavior in some environments, can manually enable it on a machine that supports tensorcore.

bf4d1792

S

[PHI decoupling] Move MKLDNN code (#48352) · fa051eec
由 Sławomir Siwek 提交于 11月 29, 2022

fa051eec

28 11月, 2022 4 次提交
- W
  fix: multihead matmul biasqk broadcast support for [1,1,seq,seq] shape (#47975) · 11b9d85f
  由 Wang Bojun 提交于 11月 28, 2022
```
* add trt support
```
  11b9d85f
- H
  [PHI decoupling] move several header files from fluid to phi (#48415) · fd9c91c3
  由 huangjiyi 提交于 11月 28, 2022
```
* decouple cudnn_desc.h from fluid

* move cudnn_desc.h from fluid to phi

* fix bugs

* decouple cudnn_helper.h from fluid

* fix bugs

* move cudnn_helper.h from fluid to phi

* add fluid cudnn_helper.h

* move miopen_desc.h from fluid to phi

* move miopen_helper.h from fluid to phi

* fix bugs

* move gpu_dnn.h from fluid to phi

* fix bugs

* update copyright year

* simplify gpu_dnn.h in fluid

* fix bugs

* fix xpu build bug

* fix compile bug

* fix bug
```
  fd9c91c3
- 张
  
  replace LoDTensor with phi::DenseTensor in fluid\operators\*\ except sequence_ops (#48418) · 30a31a53
  由张春乔提交于 11月 28, 2022
  
  30a31a53
- Use phi layernorm (#48276) · 86d92092
  由 MarDino 提交于 11月 28, 2022
  
  86d92092
23 11月, 2022 1 次提交
- Use cublaslt in multi transformer FFN (#48052) · b07e6b45
  由 MarDino 提交于 11月 23, 2022
```
* use fused mlp in multi transformer
* Restruct code
* use cublaslt to fuse ffn
* fix conflict
```
  b07e6b45
22 11月, 2022 2 次提交

T
CudnnNormConvolution is no longer supported on NVIDIA Hopper GPUs (#48203) · df4dfda0
由 Tian Zheng 提交于 11月 22, 2022
```
* Skip tests that use fused_ops on H100

* Add error message to FusedOps on H100
```
df4dfda0

[PHI decoupling] remove "gpu_device_function.h" in fluid. (#48117) · 4da1a0fe

由 huangjiyi 提交于 11月 22, 2022

* move "paddle/phi/backends/gpu/gpu_device_function.h" to phi

* update copyright years

* rm "fluid/platform/device/gpu/gpu_device_function.h" in phi

* rm dependence to "gpu_device_function.h" in fluid

* rm gpu_device_function.h etc in fluid

* fix rocm-complie bugs

* fix cuda_helper_test.cu bugs

4da1a0fe

21 11月, 2022 1 次提交

mma qk tensor_core (#48087) · d79eda71

由 lzy 提交于 11月 21, 2022

* use mma for QK dot computing in fused_multi_transformer.
* Update fused_multi_transformer_op.cu.h

d79eda71

18 11月, 2022 1 次提交

Fused QKVBiasAdd and Transpose with Split Q, KV (#47680) · d595928e

由 MarDino 提交于 11月 18, 2022

* fused qkvBiasAdd and transpose with split qkv

* fix typo

* fix format

* fix name

* add annotation

* fix comment

d595928e

BaiXuePrincess / Paddle 与 Fork 源项目一致

BaiXuePrincess / Paddle
与 Fork 源项目一致