提交 · e337d2807a6da9ba70f7a56b334aae781066215e · PaddlePaddle / Paddle

30 11月, 2022 1 次提交
- R
  Add int8 support in fused_multi_transformer_pass and fuse_multi_transformer_layer_pass (#48209) · 12486712
  由 RichardWooSJTU 提交于 11月 30, 2022
```
* delete unnecessary shape and slice op
Co-authored-by: NYour Name <you@example.com>
```
  12486712
15 11月, 2022 1 次提交
- J
  Added optimization pass for oneDNN layernorm kernel (#47782) · 519e7426
  由 jakpiase 提交于 11月 15, 2022
```
* optimization for ln

* fix

* added output to gpd

* added formatting

* fix
```
  519e7426
08 11月, 2022 1 次提交

由 Paulina Gacek 提交于 11月 08, 2022

* Split kernel registered, tests for uint/int added

* Split quantized

* Split output scales calculated only once

* NearestInterp test fix reversed

* DequantizeOutputs corrected

130db92a

07 11月, 2022 1 次提交

suqeeze2 + transpose2 fuse onednn (#47592) · fa874a46

由 Hui Zhang 提交于 11月 07, 2022

* suqeeze2 transpose2 fuse onednn

* format

* fix output shape

* fix conflict

* format

* format

* remove useless

* remove log

* simply pass

* fix comment

* fix

* fix msg

* fix error msg

* format

fa874a46

04 11月, 2022 1 次提交
- J
  Optimized oneDNN FC and added operator+unsqueeze2 and operator+reshape2 oneDNN fuse passes (#47391) · 9e006987
  由 jakpiase 提交于 11月 04, 2022
```
* tmp save

* minor chnage

* CI fix

* added FC optimizations

* latest update

* CI fix

* fixed bug with fusing fc
```
  9e006987
20 10月, 2022 1 次提交
- K
  Add FusedMultiTransformer fuse pass for GPT3 (#45907) · 5a2e5179
  由 Kaipeng Deng 提交于 10月 20, 2022
```
* add fused_multi_transformer_encoder/decoder pass, run GPT-3 success
```
  5a2e5179
18 10月, 2022 1 次提交

Merge layernorm trt fuse (#46320) · 5e9f491e

由 Wang Bojun 提交于 10月 18, 2022

* first version, accuracy corrected

* disable debug print

* use blockReduceSum in phi

* add UT

* add opCompat

* code style

* code refine

* bug fix

* code refine

* test fix

* bugfix

* codesytle fix

* code style

* code-style

* code-style

* code-style

5e9f491e

17 10月, 2022 1 次提交

Layernorm shift partition enhance (#46816) · 9e08633c

由 Wang Bojun 提交于 10月 17, 2022

* first version of ln_s_p with s>0

* refine and UT

* pass opt draft

* pass opt

* code refine

* code-style

* bug fix

* fix ci test

* code style

9e08633c

10 10月, 2022 1 次提交

Add fc residual pattern (#46757) · 0c789ae5

由 Sylwester Fraczek 提交于 10月 10, 2022

* fix fc pattern

remove use_bias
add residual input switch
fix references to pattern

* review fixes

0c789ae5

07 9月, 2022 1 次提交

Layernorm shift partition (#45736) · 960109af

由 wenbin 提交于 9月 07, 2022

* first commit

* conver done

* correct format

* layernorm_shift_partition

* correct convert

* redefine plugin

* runable

* bug fix

* modify ShiftPartitionPattern

* correct

* add UT

* modify ut

* compile

* modify enforce

* modify UT

960109af

22 8月, 2022 2 次提交
- J
  Add int8 support for matmul+elementwise_add fuse pass (#45077) · 9e5f3a38
  由 joanna.wozna.intel 提交于 8月 22, 2022
```
* Add int8 support for matmul+elementwiae_add fuse

* Corrections after review and ernie test fix
```
  9e5f3a38
- S
  Extend conv_concat_relu to support all activations (#45089) · d03ef054
  由 Sławomir Siwek 提交于 8月 22, 2022
```
* merge conv_concat_relu to conv_act

* fix typo

* extend unit test

* reuse existing gpd

* codestyle

* enforce mkldnn conv
```
  d03ef054
16 8月, 2022 2 次提交

convert multihead to oss (#45019) · f706d95d

由 feng_shuai 提交于 8月 16, 2022

* convert multihead to oss

* fix:bug

* fix:delete const cast

* fix:don't support bias_qk

* add vit pass

* fix:convert bug and add preln_residual_bias

* support length=-1

* add UT for convert

* add no_bias_qk support for gpu_multihead_op

* delete infer_shape depends on bias_qk

* oss just can be used in T4 and A*

* fix:change api for ROCM CI

f706d95d

W

fix new quant (#45155) · 2fb65e44
由 Wangzheee 提交于 8月 16, 2022

2fb65e44

04 8月, 2022 1 次提交

Matmuls with activation and elementwise_add fuses (#44655) · 0420d514

由 Sławomir Siwek 提交于 8月 04, 2022

* Add unit tests

* matmul_v2 + activation

* matmuls + elementwise_add

* matmul_v2 postops

* transform matmul to v2

* opcompat

* fix fusing matmul with multipe outs

* add shape constraints

* remove unused vars

* change pass order

* - Unit tests to be debugged

- fix

- refactor

- diagnostic

- more diagnostic

- fix

- Fix number two

- fix

- fix

- fix

- alpha added

- more fixes

- compilation fix

- removed diagnostic code

- cosmetic fixes

* lint

* add alpha constraint

* merge matmul refactor

* trigger CI

* - fix

* - another fix

* code style

* add support for matmul+elementwise_add+activation

* code style

* fix bfloat16 bugs

* change append_binary to append_sum
Co-authored-by: NJacek Czaja <jacek.czaja@intel.com>

0420d514

27 7月, 2022 1 次提交
- P
  fix RemoveIntermediateOut in fuse_elewise_add_act_pass while converting graph to program (#44593) · be132719
  由 pangyoki 提交于 7月 27, 2022
```
* fix RemoveNode in fuse_elewise_add_act_pass

* fix

* change pointer to share_ptr

* fix

* fix

* fix format

* fix

* fix graph_safe_remove_nodes
```
  be132719
19 7月, 2022 1 次提交
- R
  Rename BOOST_GET macros (#44368) · 4b085c57
  由 Ruibiao Chen 提交于 7月 19, 2022
```
* Rename BOOST_GET macros

* Fix conflicts
```
  4b085c57
11 7月, 2022 2 次提交
- Z
  Quantize shape operator (#44124) · d4372a1e
  由 Zuza Gawrysiak 提交于 7月 11, 2022
```
* Quantize shape operator

* Add shape op to propagate scales pass
```
  d4372a1e
- S
  Unify and generalize activation fuse passes (#44185) · 826e2781
  由 Sławomir Siwek 提交于 7月 11, 2022
```
* reduce redundancy

* python code style

* fix int8 ut
```
  826e2781
05 7月, 2022 1 次提交

Refactor quantization of immutable ops (#43973) · e0d7d790

由 Zuza Gawrysiak 提交于 7月 05, 2022

* Refactor quantization of immutable ops

* Fix code formatting

* Fix formatting

* Specify input names

* Fix formatting

* Change string to reference

* Formatting

e0d7d790

30 6月, 2022 1 次提交
- J
  modify graph_pattern to thread_local (#43942) · 6467ca0d
  由 JingZhuangzhuang 提交于 6月 30, 2022
```
* modify graph_pattern to thread_local

* modify graph_pattern to thread_local
```
  6467ca0d
26 6月, 2022 1 次提交
- S
  
  format all files in fluid using new config (#43776) · 576236a0
  由 Sing_chan 提交于 6月 26, 2022
  
  576236a0
23 6月, 2022 1 次提交

[external reviewing] Params to int8 pass (#42625) · b8b2d6a9

由 Sylwester Fraczek 提交于 6月 22, 2022

* sylwek

prototype params to int8 pass

* trying to make warmup work

* wip

* wip

* change test to cpp test

* review fixes, refactoring

* more refactoring

* add erasevars

* change test to fixture

* rename pass

and reorder erasevars and graphsaferemovenodes

* fix

* more refactoring and fixed bug

* formatting

* remove scale count

* enfroce message too short

* remove erasevars

erasevars couldbe cauuse of memory issues

some other fixes

* add count of successfull fuses to name of new nodes

* FindVar -> GetVar and use ConvResidual pattern

* use tensor->clear() instead of new variable

* Update paddle/fluid/framework/ir/mkldnn/params_quantization_mkldnn_pass_tester.cc
Co-authored-by: NTomasz Socha <tomasz.socha@intel.com>

* Update paddle/fluid/framework/ir/mkldnn/params_quantization_mkldnn_pass_tester.cc
Co-authored-by: NTomasz Socha <tomasz.socha@intel.com>

* Update paddle/fluid/inference/tests/api/analyzer_lexical_analysis_gru_tester.cc
Co-authored-by: NTomasz Socha <tomasz.socha@intel.com>

* add log (review fix)c

* review fix (2 functions to one)

* code review: Conv->QuantizeConv

* revert

* fix formatting

* remove unused functions

* add paddle enforce
Co-authored-by: NTomasz Socha <tomasz.socha@intel.com>

b8b2d6a9

21 6月, 2022 1 次提交
- J
  
  Correct elementwise quantization (#43693) · 9aa89b99
  由 joanna.wozna.intel 提交于 6月 21, 2022
  
  9aa89b99
22 5月, 2022 1 次提交

Quantize elementwise sub (#42854) · 2ffb3371

由 Zuza Gawrysiak 提交于 5月 22, 2022

* Add elementwise_sub quantization

* Remove unnecessary comments

* Specify names for tests

* Remove comments

* Remove comments leftovers

2ffb3371

19 5月, 2022 1 次提交

[TensorRT] Support yolov5s (#42688) · a7778930

由 shentanyue 提交于 5月 19, 2022

* support yolov5s static/int8

* fix eltwise_sub and div weight compute

* fix delete_fill_constant_pass

a7778930

12 5月, 2022 1 次提交
- S
  
  Fix some typos in paddle/. (#42408) · 2012672c
  由 Shuangchi He 提交于 5月 12, 2022
  
  2012672c
11 5月, 2022 1 次提交

Move weights and biases scale computing into pass (#42241) · c0652972

由 Zuza Gawrysiak 提交于 5月 11, 2022

* Add int8 scales gathering pass for convolution

* Fix typo

* Add unittest

* Add corrected unit test

* Change test name

* Remove enabling mkldnn in test

* Speed up test

* Change max examples

* Add functional test

* Change test name

* Add new test case

* Rename pass

c0652972

10 5月, 2022 1 次提交
- J
  pdnode_compare (#42597) · 30234dd7
  由 JingZhuangzhuang 提交于 5月 10, 2022
```
* pdnode_compare

* panode compare

* pdnode_compare
```
  30234dd7
28 4月, 2022 1 次提交

Bfloat16 refactor (#42238) · 8ad38701

由 Tomasz Socha 提交于 4月 28, 2022

* Refactor Quantization

* Refactor Dequantization

* Classy solution

* Style I

* Style II

* Style III

* Use VLOG(4) for debug info

* Style IV

8ad38701

04 4月, 2022 1 次提交
- S
  conv + elementwise_add refactor (#41286) · e5e0b726
  由 Sławomir Siwek 提交于 4月 04, 2022
```
* DRY

* change nodes names

* add const prefix

* change asX to as_x in all files
```
  e5e0b726
02 4月, 2022 1 次提交
- W
  [Paddle inference] support new quant_model (#41049) · 1b58ce14
  由 Wangzheee 提交于 4月 02, 2022
```
* paddle inference support new quant_model
```
  1b58ce14
16 3月, 2022 1 次提交

Quantize elementwise mul (#40546) · 2def79bc

由 Zuza 提交于 3月 16, 2022

* Quantize elementwise mul op

* Parametrize elementwise functions

* Fix code formatting

2def79bc

14 3月, 2022 1 次提交

Add an elementwise + activation fusion pass. (#36541) · 3f219160

由 Tomasz Socha 提交于 3月 14, 2022

* Add elementwise add and activation fuse pass

* Fix copy ellision

* More flexible pattern detector

* More flexible fusion pass

* Update lists for pass

* Add support for Pow operator

* Add support for more activation types

* Style

* Rename fusion pass

* First version of tests

* Dirty version of pass

* Polished version

* Update pbtxt

* Style

* Update names

* Style

* Use PADDLE_ENFORCE_EQ

* Save error message to variable

* WO for error checks

* CR

* Static style check

* Add missing 'activation_scale' attribute

* Add relu6 and sigmoid activations

* Style

* Fix fuse list formating

* Sync filenames for fuse pass files

* Fix cmake after move

* Fix registration

* Fix pass name in tests

* Add missing activations to checker

* WIPS

* Working mul op

* Working sub

* Working Add

* Remove pten includes

* Remove some forward declarations

* Remove Includes

* Fixes

* Remove default kernels

* Add check if post_ops attributes are avaliable

* Style

* Code adjustment

* Register default kernels

* We have year 2022 not 2021...
Co-authored-by: Njakpiase <jakpia21@gmail.com>
Co-authored-by: NSylwester Fraczek <sylwester.fraczek@intel.com>

* Fast review fixes
Co-authored-by: Njakpiase <jakpia21@gmail.com>
Co-authored-by: NSylwester Fraczek <sylwester.fraczek@intel.com>

* Review Fix

* Rename one_dnn -> onednn

* Style after review

* Fast and dirty fix for quantization

* Update tests

* Style

* Fix mkldnn_quantizer config

* Add Joanna's suggestion.

* Check if operator is explicitly disables on OneDNN

* Try to use unregistered attributes

* Style

* Test new framework

* FXI

* FXII

* Update test

* Style
Co-authored-by: Njakpiase <jakpia21@gmail.com>
Co-authored-by: NSylwester Fraczek <sylwester.fraczek@intel.com>

3f219160

07 3月, 2022 1 次提交

cuBlasLt Epilogue To Fuse Linear + ReLU|GeLU (#39437) · 2a3d9eca

由 Ming-Xu Huang 提交于 3月 07, 2022

* Added cuBlasLtHandle_t to device context.

* Added fused_gemm_epilogue op.

1. Added fused_gemm_epilogue op to leverage cuBlastLt Epilogue.
2. Support fusion Act(X*Y + bias), X'dims >=2 and Y'dims shoule be 2.
2. Act currently only be supported ReLU. (Will add GeLU in the future).

* Added UT to fused_gemm_epilogue op.

* Added LinearAct Pattern

1. Added LinearAct into graph_pattern_detector.* to define (2.)'s
pattern.
2. LinearAct is used to detect act(element_add(matmul_v2(x, w), bias)).
3. act currently only support ReLU (Will support GeLU in the future).

* Added FuseGemmEpiloguePass

1, Added FuseGemmEpiloguePass to handle nn.Linear + Act{ReLU}
fusion (GeLU will be supported in the future).
2. Only support matmul_v2 from nn.Linear.

* Added pybind to BuildStrageter.fuse_gemm_epilogue_.

* Added UT for fuse_gemm_epilogue_pass.

* GeLU support and EpilogueSingleton

1. Added GeLU support to fused_gemm_epilogue op.
2. Added EpilogueSingleton to cache auxiliary pointer.
3. Added related UTs.

* Rename cublaslt_epilogue_opto gemm_epilogue_op.*.

* Added both train and infer pattern to LinearAct.

1. Added support of fwd graph with grap_ops linking to LinearAct.
2. Added related changes to fuse_gemm_epilogue_pass for above
modification.

* Changed CUDA requirement from 11.4 to 11.6 for fuse_gemm_epilogue_pass.

* Added identity activation support to gemm_epilogue_op.

* Added Linear Fusion (matmul_v2 + ele_add)

1. Added matmul_v2 + ele_add pattern to LinearActPattern.
2. Added matmul_v2 + ele_add support to fuse_gemm_epilogue_pass.

* Rename gemm_epilogue_op.* to fused_gemm_epilogue_op.*

* Add fused_gemm_epilogue_grad op.

1. Added fused_gemm_epilogue_grad to support backward epilogue fusion.

* Add UTs to fused_gemm_epilogue_grad_op.

* Change attribute name in fused_gemm_epilogue_grad_op for clearing.

* Allow DX and DBias be dispensable to fused_gemm_epilogue_grad op.

* Added ElementwiseAdd+Matmul+Act graph pattern detection.

* Fuse backward of Linear( Act(x))

1. Added backward fusion pass to Linear( Act(x)).
2. Added backward fusion pass to Linear(x).

* Added UTs to backward fusion of Linear(Act(x)).

* Complete document of arguments to fused_gemm_epilogue_op.

* Made arguments of some functions pass by reference.

* Modify code with review comments.

1. Made arguments of some function pass by reference.
2. Removed redundant code.
3. Followed Google code style to change code.

* Made 'const' code style be consistent

* Fixed random seed of python UTs.

* Set Compiling constrains to cuBlasLt

1. Require CUDA 11.6+
2. Remove fuse_gemm_epilogue related tests when CUDA < 11.6.

* Code Reivew from Paddle

1. Changed arguments name is_first_gemm to without_x_gradient for
clearing.
2. Applied PADDLE_THROW in fused_gemm_epilogue_op.

* Remove EpilogueSingleton

1. Applied ReserveSpace to replace Epilogue for passing auxiliary
pointers between FWD and BWD.

* Fix a logical error and enhance UTs.

1. Added act op count checking in UTs.
2. Fix issue to fuse backward or ReLU(Linear(X)).
3. TODO: solve GELU fusion issues.

* Fix Linear and GeLU fusion issues.

1. Modified graph_detech_pattern to fit with both linear wiht gelu or
relu.
2. Modified data range in Uts to allow negative values.

* Removed fused_gemm_epilogue_op.h.

* Rename namespace pten to phi.

* Rename name of arguments in fused_gemm_epilogue_op

1. bias -> Bias.
2. out -> Out.
3. reserve_space -> ReserveSpace.

* Change EpiloguePassActivationCache as local variable.

1. Removed singleton in EpiloguePassActivationCache.
2. Made EpiloguePassActivationCache as an argument to each pass
functions.

2a3d9eca

24 2月, 2022 1 次提交
- J
  Fix for split op in BF16 inference (#39548) · 75f91ce4
  由 jakpiase 提交于 2月 24, 2022
```
* Fix for split bf16 inference

* added test for pass

* changes after review
```
  75f91ce4
08 2月, 2022 1 次提交
- J
  [Bug fix] Fixed handling of one of the cases in the quantization process (#39342) · e4d475ea
  由 joanna.wozna.intel 提交于 2月 08, 2022
```
* Fix quantization next op findings

* Corrections according to the review
```
  e4d475ea
05 1月, 2022 2 次提交
- J
  
  Add input data type checking in BF16 placement pass (#38702) · 60c51de5
  由 joanna.wozna.intel 提交于 1月 05, 2022
  
  60c51de5
- J
  Quantize nearest_interp and nearest_interp_v2 (#38622) · 1456b02d
  由 joanna.wozna.intel 提交于 1月 05, 2022
```
* Quantize nearest_interp and nearest_interp_v2

* Check if avx_core supported

* Add depthwise_conv2d to supported quantization list
```
  1456b02d
20 12月, 2021 1 次提交

add matmul_scale_fuse_pass (#37962) · ce335c23

由 heliqi 提交于 12月 20, 2021

* add matmul_scale matmul_v2_scale fuse pass

* add scaletensor judge

* modify var name

* add timeout notest;test=coverag

* fix error commit

* fix use_mkldnn attr

* fix use_mkldnn attr

ce335c23

PaddlePaddle / Paddle 大约 1 年 前同步成功

PaddlePaddle / Paddle
大约 1 年前同步成功