提交 · 66098bff63d252a9d8825fe8f95e6c1f61e337b9 · PaddlePaddle / Paddle

29 3月, 2023 1 次提交

由 yuehuayingxueluo 提交于 3月 29, 2023

* add fuse adamw pass

* fix some bugs

* fix CIbug

* change chunk_size

* fix CI bug

* rm test_fused_adam_op.py

* fix CI bugs

* fix fuse_adamw_op_pass.cc

* change code style

* fix CI bug

* fix ut bug and use_adamw_op_pass.cc

* fix test_fuse_adamw_pass.py

* fix CI bug

* remove fluid

* fix ci bug

* fix CI bug

66098bff

27 3月, 2023 1 次提交

Fused elementwise_(mul/div) (#50428) · 968f7f24

由 Sławomir Siwek 提交于 3月 27, 2023

* extract Op and OPMaker to .h

* extend pattern for fused_op

* set "with_residual" default to false

* adjust fuse passes

* remove fc+eltwise flag

* fused_output_scale

* activation attrs

* remove extra attrs

* fix int8/bf16 unit tests

* simplify RecomputeOutputDims

* remove unused method

* Add description for attributes

* add extra check

* adjust op compats

* update quantize test

* fix protobuf parsing error

* fix int8 performance

* fused elementwises

* merge develop

* remove activation

* restore activation for existing add/sub ops

968f7f24

22 3月, 2023 5 次提交

J

Correct lstm qat test (#51499) · 31f81685
由 joanna.wozna.intel 提交于 3月 22, 2023

31f81685

Add fused_feed_forward pass (#50423) · 5dda0ef6

由 Ghost Screaming 提交于 3月 22, 2023

* Add fused_feed_forward pass for semi-automatic static graph training.

* Add fused_feedforward property in parallel_executor.cc

* Polish code.

* Polish fused feed_forward pass code. Support use_dropout1 and
use_dropout2 option.

* Support model parallel in fused_feedforward pass.

5dda0ef6

Extract fused_transpose op dedicated for oneDNN fuse passes (#50021) · 02296977

由 Sławomir Siwek 提交于 3月 22, 2023

* extract common methods to reuse

* add header for transpose ops

* fused_transpose

* Split big function

* transpose2 tests

* fused_transpose

* Apply extra attributes

* add pbtxt file

* update pbtxt

* Merge develop

* add more strict op compats

* code  style

* remove mkldnn_data_type

* unify SetOutMemDescWithReshape2FuseSupport

* adjust quantize-dequantize for transpose

* remove appendact

* transpose2 quantization

* fix int8 tests

* adjust transpose_op to current develop

* delete fusion code from transpose_kernel

* add fused transpose to NHWC unittest

* change order

02296977

Z

[XPU] optimize graph if beam_size=1 (#51732) · 720b14e3
由 zhupengyang 提交于 3月 22, 2023

720b14e3
S

remove duplicate mkldnn_data_type (#51598) · 80472116
由 Sylwester Fraczek 提交于 3月 21, 2023

80472116

21 3月, 2023 1 次提交

[PHI decoupling] Move DataType* from paddle:experimental to phi namespace (#51716) · 4638a62e

由 iSerendipity 提交于 3月 21, 2023

* move DataType from paddle::experimental to phi

* convert namespace

* convert namespace

* convert namespace

* clarify namespace

* convert more datatype

* Revert "convert more datatype"

This reverts commit 083b462959e6a22d4d8767707b628b95b396642e.

* convert more in auto_code_generator

* fix conflicts for XPU

* fix namespace conflicts

* fix errors

* Revert "fix errors"

This reverts commit f9d9958b54ee32141112274c8a5c3c381ab0f876.

* fix errors

* fix formatting

4638a62e

20 3月, 2023 2 次提交
- C
  
  Fix fc_xpu_fuse_pass Attribute (beta) is not set correctly error (#51801) · 79bc9c0d
  由 chalsliu 提交于 3月 20, 2023
  
  79bc9c0d
- M
  
  [xpu] fused_multi_transformer_xpu pass&kernel support (#51571) · 52e1742f
  由 mayang002 提交于 3月 20, 2023
  
  52e1742f
16 3月, 2023 1 次提交

split layernorm pass (#51228) · 3f3372b6

由 wenbin 提交于 3月 16, 2023

* split pass

* fix compile

* fix ut

* more time

* modify ut

* reduce dim

* fix compile

* reshape weight

* tensor

* remove enforce

* static shape ut

* batchsize

* reorder pass

* minus test cases

* windows timeout

* windows time out

* remove test for windows

* correct

* sssss

* xxx

3f3372b6

15 3月, 2023 1 次提交

[PHI] remove operator.h in blas.h (rebase to latest codebase) (#51472) · 427712df

由 iSerendipity 提交于 3月 15, 2023

* Revert "Revert "【Hackathon No.67】remove operator.h in blas.h (#50989)" (#51467)"

This reverts commit b9d91531.

* remove cout

* add header

* fix missing header

* fix refer fluid error

* fix missing header

* 更新 repeat_interleave_grad_kernel_impl.h

Change to phi style datatype.

* 更新 repeat_interleave_grad_kernel_impl.h

Fix missing header

* datatype fluid -> phi

* paddle::experimental -> phi

* fix reference error

* fix reference error

* fix reference error

* fix errors

* fix missing FLAGS

* fix missing headers

* fix missing headers

* fix missing headers

* fix missing headers

* fix missing header

* fix missing header

* fix errors

427712df

14 3月, 2023 1 次提交
- S
  
  [Hackathon NO.73] 为 Paddle-TRT 添加 temporal_shift 算子 (#51207) · e79699fb
  由 Sonder 提交于 3月 14, 2023
  
  e79699fb
13 3月, 2023 3 次提交

Fused softplus (#51087) · fdcfa04f

由 Sławomir Siwek 提交于 3月 13, 2023

* mkldnn->onednn

* fused softplus op + kernel

* remove extra attributes

* add missing handler

* change var name

fdcfa04f

[Paddle Inference ]use python to generate cutlass code (#50603) · 4e9e23cb

由 zhoutianzi666 提交于 3月 13, 2023

* use python to generate cutlass code

* refine CommonConvKernelPart1, CommonConvKernelPart2

* remove useless code in generate_cutlass_code.sh

* add more config in conv2d_residual

* CommonCutlassConvKernelPart1 and CommonCutlassConvKernelPart2

* add group conv support in util.cu

* remove .sh

* refine name

* make name goodgit status!

* add fuse_alpha

* make code easy to understand

* mot fopen generate in py

* use python script to generate conv2d,group=1 cutlass code

* use const &

* use const & && use python script to generate conv2d/group=1 code

4e9e23cb

Z

[xpu] optimize multi_encoder_xpu_fuse_pass performance (#51346) · e2cdd4a3
由 zhupengyang 提交于 3月 13, 2023

e2cdd4a3

09 3月, 2023 1 次提交
- W
  
  fix maybe-uninitialized compiler warning in Linux (#51336) · 7e56147d
  由 Wang Xin 提交于 3月 09, 2023
  
  7e56147d
07 3月, 2023 1 次提交
- Z
  
  [XPU] support shared weight; delete isolated node (#51108) · 39a9abaa
  由 zhupengyang 提交于 3月 07, 2023
  
  39a9abaa
06 3月, 2023 2 次提交
- P
  Rewrite multi_gru_fuse_pass_tester & multi_gru_seq_fuse_pass_tester (#50094) · c9a39758
  由 Paulina Gacek 提交于 3月 06, 2023
```
* first approach

* test finished

* cpp test deleted

* CmakeList corrected

* multi_gru_seq_fuse_pass rewritten

* dummy cout deleted

* review changes

* timeout extended
```
  c9a39758
- S
  
  convert todos to internal tasks (#51174) · 6b393e45
  由 Sławomir Siwek 提交于 3月 06, 2023
  
  6b393e45
02 3月, 2023 2 次提交
- Z
  Fix performance problem in BF16 models (#50283) · e421c6a6
  由 zyfncg 提交于 3月 02, 2023
```
* fix performance drop in BF16 models

* fix test_cpu_quantize_squash_pass
```
  e421c6a6
- Y
  
  process multiple conv2d_fusion shares weight (#51068) · ae60105d
  由 Yuanle Liu 提交于 3月 02, 2023
  
  ae60105d
01 3月, 2023 2 次提交
- C
  
  [XPU] Fix xpu_fuse_pass error caused by weight sharing by other operators. (#51039) · 1054b23e
  由 csy0225 提交于 3月 01, 2023
  
  1054b23e
- Z
  
  [XPU] delete op device (#51029) · c9309942
  由 zhupengyang 提交于 3月 01, 2023
  
  c9309942
28 2月, 2023 3 次提交
- H
  Rewrite mkldnn fc rnn fuse pass tester (#50265) · eb22391c
  由 Hulek 提交于 2月 28, 2023
```
* Added file

* Tests separated and rewritten, fixed fc_lstm_fuse_pass

* Resolve conflicts
```
  eb22391c
- Z
  
  [XPU] support convert fp16 model (#50790) · f265a313
  由 zhupengyang 提交于 2月 28, 2023
  
  f265a313
- Z
  forbid tensorrt_engine op's output is a persistable var (#50932) · bbf2bc2b
  由 zhoutianzi666 提交于 2月 28, 2023
```
* forbid tensorrt_engine op's output is a persistable var
```
  bbf2bc2b
27 2月, 2023 1 次提交
- W
  [TRT] Add sm version check for TensorRT flash attention and cross attention pass/plugin (#50830) · 38dad3b9
  由 Wang Bojun 提交于 2月 27, 2023
```
* add sm version check

* use GetGPUComputeCapability
```
  38dad3b9
24 2月, 2023 1 次提交

Fused ops converter (#50751) · 9429936c

由 Sławomir Siwek 提交于 2月 24, 2023

* ConvertToFusedOp

* change static to inline
Co-authored-by: NTomasz Socha <tomasz.socha@intel.com>

---------
Co-authored-by: NTomasz Socha <tomasz.socha@intel.com>

9429936c

23 2月, 2023 2 次提交
- C
  
  [XPU] Migrate xpu_embedding_with_eltwise_add_fuse_pass (#50590) · 8d325d82
  由 csy0225 提交于 2月 23, 2023
  
  8d325d82
- Z
  
  [XPU] optimize multi_encoder_xpu_pass (#50759) · 5c9299e5
  由 zhupengyang 提交于 2月 23, 2023
  
  5c9299e5
22 2月, 2023 1 次提交
- Z
  
  [XPU] link out_max to x_max between xpu_fusion_ops (#50690) · 1fd1c169
  由 zhupengyang 提交于 2月 22, 2023
  
  1fd1c169
21 2月, 2023 1 次提交

Support bw invoke fw (#50260) · d8845735

由 HappyHeavyRain 提交于 2月 21, 2023

* support bw invoke fw

* fix scale in static_backward.yaml

* fix the bug in tensorrt/convert

* move 'scale','sign' into ops.yaml

* add scale_grad of scale in op_compat.yaml

* change generated_static_op in CMakeLists.txt

d8845735

20 2月, 2023 1 次提交
- S
  
  [XPU] fix fc_xpu_fuse_pass (#50569) · 77606f5d
  由 shentanyue 提交于 2月 20, 2023
  
  77606f5d
17 2月, 2023 2 次提交

upgrade oneDNN to 2.7.3 (#46301) · f803b239

由 Sławomir Siwek 提交于 2月 17, 2023

* change SHA

* update to oneDNN 2.7

* update to 2.7.1

* update to 2.7.2

* add supported hardsigmoid

* update to 2.7.3

* limit cpu threads for int8 test

* group activations

f803b239

Z
[XPU] add multi_encoder_xpu_slice_fuse_pass, generate_sequence_xpu_fuse_pass,... · 61469eec
由 zhupengyang 提交于 2月 17, 2023
```
[XPU] add multi_encoder_xpu_slice_fuse_pass, generate_sequence_xpu_fuse_pass, generate_sequence_xpu kernel (#50570)
```
61469eec

16 2月, 2023 4 次提交

Add matmul_v2 and fused_matmul to the quantization process and adjust Ernie model test (#50354) · 8686a745

由 joanna.wozna.intel 提交于 2月 16, 2023

* Add matmul_v2 to the quantization process and adjust Ernie model test

* Correct cpu_quantize_pass test

* Move op to fuse transformation to placement pass

* Correct test

8686a745

Rewrite mkldnn conv bn fuse pass tester (#50034) · e2aacd21

由 Hulek 提交于 2月 16, 2023

* New onednn test

* checkopoint

* added new test, fixed issue with onednn bias

* fix bias check

* remove prints, refactor code

* delete old test

* update python tests cmake

* Delete depracated conv bias

* Delete outdated bias from convolution test

e2aacd21

S
[XPU][Fleet] Support multi-card infer for xpu (#50490) · 517d8074
由 shentanyue 提交于 2月 16, 2023
```
* support xpu multi-card infer

* add ut

* clean code

* clean code

* fix

* fix

* fix

* fix
```
517d8074
Z

[XPU] fix dropout pass; add multi_encoder_xpu_fuse_pass & multi_encoder_xpu kernel (#50499) · c8aa6405
由 zhupengyang 提交于 2月 16, 2023

c8aa6405

PaddlePaddle / Paddle 大约 1 年 前同步成功

PaddlePaddle / Paddle
大约 1 年前同步成功