提交 · e43f710212a2b8f46a203ddb51fc88d90129888d · PaddlePaddle / Paddle

16 1月, 2023 2 次提交
- Y
  [Paddle-TRT] support nhwc (#49633) · e43f7102
  由 Yuanle Liu 提交于 1月 16, 2023
```
* add trt_support_nhwc_pass
```
  e43f7102
- Y
  add gpu_cpu_map_matmul_to_mul_pass to kGpuLowerPrecisionPasses (#49753) · 07514139
  由 Yuanle Liu 提交于 1月 16, 2023
```
* add gpu_cpu_map_matmul_to_mul_pass to kGpuLowerPrecisionPasses

* disable fc_elementwise_layernorm_fuse_pass in mixed precision
```
  07514139
13 1月, 2023 1 次提交
- W
  add oss flash fmha and fmhca support (#49438) · a48b8e2c
  由 Wang Bojun 提交于 1月 13, 2023
```
* add fmha_flashattention oss plugin
```
  a48b8e2c
09 1月, 2023 2 次提交
- W
  Preln groupnorm (#49463) · 591be3bd
  由 wenbin 提交于 1月 09, 2023
```
* skip_groupnorm

* init

* preln

* add ut

* more assert

* set timeout

* fix windows ci issue
```
  591be3bd
- G
  
  Unify the pass of the map class (#49568) · ee49994f
  由 gem5 提交于 1月 09, 2023
  
  ee49994f
06 1月, 2023 1 次提交
- Y
  
  [Inference] fix pass_builder (#49595) · 44cb3da3
  由 Yuanle Liu 提交于 1月 06, 2023
  
  44cb3da3
05 1月, 2023 1 次提交
- W
  
  [Inference] inplace all reshape op (#49146) · 017af746
  由 Wilber 提交于 1月 05, 2023
  
  017af746
04 1月, 2023 1 次提交
- L
  
  add multi_devices_fused_multi_transformer_encoder_pass and cherry-pick from 48349 (#49383) · 29eec2dd
  由 lzy 提交于 1月 04, 2023
  
  29eec2dd
03 1月, 2023 1 次提交

[Paddle Inference] Implement conv2d_fusion NHWC format using cutlass (#47989) · c123dd1e

由 zhoutianzi666 提交于 1月 03, 2023

* Implement conv2d_fusion NHWC format using CUTLASS
* Add unit testing for CUTLASS Conv in inference
* Add experimental API for CUTLASS.

c123dd1e

22 12月, 2022 1 次提交
- G
  
  Enable identity_scale_op_clean_pass by default (#49227) · 9dac1e71
  由 gem5 提交于 12月 22, 2022
  
  9dac1e71
19 12月, 2022 1 次提交
- W
  [Paddle Inference] General optimization for no_varlen skiplayernorm (#49039) · b50dbe0b
  由 Wangzheee 提交于 12月 19, 2022
```
* General optimization for no_varlen embedding layernorm
```
  b50dbe0b
14 12月, 2022 2 次提交
- Y
  
  [Paddle Inference] rewrite convert_to_mixed_precision (#48853) · 28ea9aad
  由 Yuanle Liu 提交于 12月 14, 2022
  
  28ea9aad
- H
  Deleted mkldnn_inplace_pass code (#47818) · 3cfb2e1a
  由 Hulek 提交于 12月 14, 2022
```
* Deleted mkldnn_inplace_pass code

* Fixed error with cmake

* Resolve conflicts
```
  3cfb2e1a
12 12月, 2022 1 次提交
- F
  
  fix: Move the pass location to the appropriate location (#48951) · 6698e8d1
  由 feng_shuai 提交于 12月 12, 2022
  
  6698e8d1
08 12月, 2022 4 次提交
- R
  rewrite delete_weight_dequant_linear_op_encoder/decoder pass (#48650) · 95332bef
  由 RichardWooSJTU 提交于 12月 08, 2022
```
* rewrite delete_weight_deqquant_linear_op_encoder/decoder pass
```
  95332bef
- W
  [Paddle Inference] General optimization for no_varlen embedding layernorm (#48580) · 22bfa579
  由 Wangzheee 提交于 12月 08, 2022
```
* general optimization no_varlen embedding layernorm
```
  22bfa579
- W
  
  [Inference] Enable infer shape cache. (#48312) · f88713e1
  由 Wilber 提交于 12月 08, 2022
  
  f88713e1
- W
  
  [Inference] inference add cinn interface (#48741) · 3a387df6
  由 Wilber 提交于 12月 08, 2022
  
  3a387df6
06 12月, 2022 1 次提交
- Y
  
  [Paddle Inference] Add float_to_half_pass to support inference with mixed precision (#47993) · c5a45cc6
  由 Yuanle Liu 提交于 12月 06, 2022
  
  c5a45cc6
05 12月, 2022 1 次提交

Reverse roll fuse (#46914) · feb68dd1

由 Wang Bojun 提交于 12月 05, 2022

* pass

* pass

* draft version

* share mem opt

* remove sharemem

* add pattern for the case with circle_shift=0

* add UT

* pass opt

* test_fix

* code-commit

* code-style

* code style

* code-style

* ut-fix

* op teller refine

* resolve conflict

* adjust position op_teller list and pass order for swin

* ut code style update

* adjust paddle pass order

* refine pass order

* refine pass order

* refine pass order

feb68dd1

30 11月, 2022 2 次提交
- F
  
  feat:add the support for vit_attention_op on gpu (#48515) · e9ca7600
  由 feng_shuai 提交于 11月 30, 2022
  
  e9ca7600
- R
  Add int8 support in fused_multi_transformer_pass and fuse_multi_transformer_layer_pass (#48209) · 12486712
  由 RichardWooSJTU 提交于 11月 30, 2022
```
* delete unnecessary shape and slice op
Co-authored-by: NYour Name <you@example.com>
```
  12486712
23 11月, 2022 1 次提交
- W
  
  add map_depthwise_conv_to_conv pass (#47955) · 3daf5185
  由 Wilber 提交于 11月 23, 2022
  
  3daf5185
21 11月, 2022 2 次提交

add fc-residual quantization (#46917) · fed0ed34

由 Sylwester Fraczek 提交于 11月 21, 2022

* add fc-residual quantization

* revert removal of check for use_mkldnn

* fix bug

* add disable_logs

* review fix

call twice AreScalesPresntForNodes instead of if-else

* rewrite residual input to output

* revert fc mkldnn taking residual data

* format fix

* fix LoDTensor->DenseTensor

* LoDTensor->DenseTensor

* output->input

* revert changes to unsupported script

revert changes to unsupported script

* remove fc residualdata from output blocklist in cpu_bfloat16_pass.cc

fed0ed34

R

delete unnecessary shape and slice op (#48112) · 41483383
由 RichardWooSJTU 提交于 11月 21, 2022

41483383

16 11月, 2022 1 次提交
- P
  Add bf16 data type support to oneDNN bilinear_interp kernel (#46770) · 8e6315e4
  由 Piotr Paturej 提交于 11月 16, 2022
```
* Enable bf16 in oneDNN bilinear_interp kernel

* Fix bilinear_interp_v2 not enabled in models

* Remove unnecessary checks
```
  8e6315e4
15 11月, 2022 1 次提交
- J
  Added optimization pass for oneDNN layernorm kernel (#47782) · 519e7426
  由 jakpiase 提交于 11月 15, 2022
```
* optimization for ln

* fix

* added output to gpd

* added formatting

* fix
```
  519e7426
10 11月, 2022 2 次提交
- Z
  [search && paddle inference]add roformer pass&&plugin novarlen version (#47523) · 0f3fb562
  由 zhangxin81 提交于 11月 10, 2022
```
* add roformer pass&&plugin（novarlen）
```
  0f3fb562
- R
  Fuse multi transformer layer pass (#47541) · 1e3245a8
  由 RichardWooSJTU 提交于 11月 10, 2022
```
* add fuse_multi_transformer_layer_pass
```
  1e3245a8
09 11月, 2022 2 次提交

J

Fix U2++ perf (#47780) · b1fb2360
由 joanna.wozna.intel 提交于 11月 09, 2022

b1fb2360

Enable fc passes (#45704) · 7e914386

由 Paulina Gacek 提交于 11月 09, 2022

* Analysis API interface for disabling fc passes

* Unit tests corrected

* Python API added

* test runs only when PADDLE_WITH_MKLDNN

* Fc op changed to relu in matmul_op_test

* Disable fc passes in tests where acc drops

* code formating

* Unit test for analysisConf added

* Unit test gpu added

* fc passes disabled when iterations=0 in gru test

* style

* passes disabled when fp32 in gru test

* fc passes disabled in lstm test

* Import from inference, not fluid in doc

7e914386

08 11月, 2022 1 次提交
- K
  
  add fuse_multi_transformer passes to fp16. test=develop (#47676) · caca5687
  由 Kaipeng Deng 提交于 11月 08, 2022
  
  caca5687
07 11月, 2022 1 次提交

suqeeze2 + transpose2 fuse onednn (#47592) · fa874a46

由 Hui Zhang 提交于 11月 07, 2022

* suqeeze2 transpose2 fuse onednn

* format

* fix output shape

* fix conflict

* format

* format

* remove useless

* remove log

* simply pass

* fix comment

* fix

* fix msg

* fix error msg

* format

fa874a46

04 11月, 2022 1 次提交
- J
  Optimized oneDNN FC and added operator+unsqueeze2 and operator+reshape2 oneDNN fuse passes (#47391) · 9e006987
  由 jakpiase 提交于 11月 04, 2022
```
* tmp save

* minor chnage

* CI fix

* added FC optimizations

* latest update

* CI fix

* fixed bug with fusing fc
```
  9e006987
03 11月, 2022 1 次提交
- Y
  Fix ComputePropagateScalesMkldnnPass of MKLDNN (#47574) · 5fc92943
  由 yeliang2258 提交于 11月 03, 2022
```
* add constant_folding_pass pass for mkldnn int8

* update UpdateScaleOpInOutScales
```
  5fc92943
26 10月, 2022 2 次提交

Preln_Layernorm_Shift_Partition (#47099) · d17d0cd1

由 wenbin 提交于 10月 26, 2022

* prelnlayernorm_shift

* add ut

* remove paddle_enforce

* remove useless

* add UT

* remove UT

* add UT

* set timeout

d17d0cd1

FC/matmul(v2) + scale fuse pass (#47127) · c1c2be2d

由 Sławomir Siwek 提交于 10月 26, 2022

* fc/matmuls + scale fuse pass

* remove double-extension

* add unit tests

* comments from review

* codestyle

* add pass to int8 list

* new codestyle

* attr name typo

c1c2be2d

20 10月, 2022 2 次提交
- F
  
  fix:use constant_fold for vit pass (#47211) · ac666538
  由 feng_shuai 提交于 10月 20, 2022
  
  ac666538
- K
  Add FusedMultiTransformer fuse pass for GPT3 (#45907) · 5a2e5179
  由 Kaipeng Deng 提交于 10月 20, 2022
```
* add fused_multi_transformer_encoder/decoder pass, run GPT-3 success
```
  5a2e5179
18 10月, 2022 1 次提交

Merge layernorm trt fuse (#46320) · 5e9f491e

由 Wang Bojun 提交于 10月 18, 2022

* first version, accuracy corrected

* disable debug print

* use blockReduceSum in phi

* add UT

* add opCompat

* code style

* code refine

* bug fix

* code refine

* test fix

* bugfix

* codesytle fix

* code style

* code-style

* code-style

* code-style

5e9f491e

PaddlePaddle / Paddle 大约 1 年 前同步成功

PaddlePaddle / Paddle
大约 1 年前同步成功