提交 · 25a0b46dc6155b058f7a1ef04550bd2dcf65dbd1 · PaddlePaddle / Paddle

04 9月, 2023 1 次提交
- D
  
  optimize softmax_mask_fuse (#56877) · 25a0b46d
  由 duanyanhui 提交于 9月 04, 2023
  
  25a0b46d
31 8月, 2023 2 次提交

Add fused_scale_bias_relu_conv_bnstats OP (#55026) · 71e28b12

由 Tian Zheng 提交于 8月 31, 2023

* Add fused_scale_bias_relu_conv_bnstats op

* Review changes

* Fix no CUDNN Frontend build

* Fix PADDLE_ENFORCE format

* Fix PADDLE_ENFORCE CI error

* Rename kernel filename

* Refactor unittest to use paddle eager_op_test

* Fix padding bugs

* Review changes

* test=cuda117

* test=cuda117

71e28b12

Z

[Fluid] Move distributed_fused_lamb_init to phi (#55993) · 0bc369ef
由 Zero Rains 提交于 8月 31, 2023

0bc369ef

29 8月, 2023 3 次提交

Remove need_move_to_phi (#56371) · daac3829

由 Sonder 提交于 8月 29, 2023

* remove flag

* open static build flag

* add searchsorted to list

* add register info for fused layernorm

* fix fused_layernorm_kernel output registe info

* fix stft registe info

* add include

* fix registe info

* add skip fake init for fused_layernorm:residual_out

* fix error

* add distributed_fused_lamb_init to StaticBuildBlackList

* set static_build flag to false

daac3829

L

make variable_length_memory_efficient_attention supports mask_broadcast_heads (#56673) · 6839a7b9
由 lzy 提交于 8月 29, 2023

6839a7b9
C
[clang-tidy] No.26,27 enable misc-unused-using-decls,misc-unused-alias-decls (#56485) · 138bdf40
由 cyberslack_lee 提交于 8月 29, 2023
```
* fix

* fix
```
138bdf40

25 8月, 2023 2 次提交

New ir support fuse bn add act (#56247) · d3f4596a

由 hong 提交于 8月 25, 2023

* support new ir load combine

* update

* polish code

* remove print

* update

* update

* update

* polish code

* fix bug

* polish code

* fix compile bug

* fix bug

* revert code

* remove useless code

* polish code

d3f4596a

X
[Paddle Inference] Add bias input of mmha and simplify mmha. (#56411) · 636dc2ff
由 xiaoxiaohehe001 提交于 8月 25, 2023
```
* add_bias_and_simplify_mmha
```
636dc2ff

22 8月, 2023 2 次提交
- J
  
  [XPU] modify add_layernorm_xpu kernel (#56429) · eb0e4d4b
  由 jiangfan06 提交于 8月 22, 2023
  
  eb0e4d4b
- [Paddle Inference] refactor linear_compress (#55490) · ffff3da0
  由 FormlessUnit 提交于 8月 22, 2023
```
* Modify kernels to support quantized_matmul

---------
Co-authored-by: Nsuperxf <1208713646@qq.com>
```
  ffff3da0
16 8月, 2023 2 次提交
- Refine FusedNorm comment (#56305) · 12547fb4
  由 MarDino 提交于 8月 16, 2023
```
* refine static op return val
```
  12547fb4
- J
  
  [XPU] Add fast_layernorm_xpu_fuse_pass and fast_layernorm_xpu plugin (#56269) · f16e1869
  由 jiangfan06 提交于 8月 16, 2023
  
  f16e1869
15 8月, 2023 1 次提交

[Paddle Inference] Add masked multihead attention kernel and export API. (#55344) · 989c5e87

由 xiaoxiaohehe001 提交于 8月 15, 2023

* support_mmha
* add_python_api
* add_api_doc
* fix_doc_error
* fix_infermeta
* add_infermeta
* add_bf16_cuda_check
* add_bf16_check
* fix_ci_windows
* fix_ci_windows_kernel_register
* fix_test_mmha
* add_cumoffsets
* remove_bias
* delete_mmha_reshape_input_output
* rename_delete_hfile
* remove_fluid

---------
Co-authored-by: Nyangjianfengo1 <yangjianfeng01@baidu.com>

989c5e87

14 8月, 2023 2 次提交
- S
  
  [Fluid] Move fused_softmax_mask_upper_triangle to phi (#55769) · 6e40fc1d
  由 Sonder 提交于 8月 14, 2023
  
  6e40fc1d
- Add rmsnorm residual bias add and quant (#55965) · 2ac6a7e4
  由 MarDino 提交于 8月 14, 2023
```
* add rmsnorm residual bias add and quant

* refine python interface

* add rmsnorm unittest

* Add layernorm

* fix layernorm unittest

* refine unittest

* fix example code

* fix review comment
```
  2ac6a7e4
11 8月, 2023 2 次提交
- Y
  Fix the shape of input sin and cos for fused_rope. (#56132) · f60c698f
  由 Yiqun Liu 提交于 8月 11, 2023
```
* Fix the shape of input sin and cos for fused_rope.

* Update shape in unittest.
```
  f60c698f
- W
  
  [XPU]Add flip kernel (#55932) · ee003457
  由 wz1qqx 提交于 8月 10, 2023
  
  ee003457
10 8月, 2023 1 次提交

Add variable_length_memory_efficient_attention (#55400) · 4036c937

由 lzy 提交于 8月 10, 2023

* add variable_length_memory_efficient_attention
* update variable_length_memory_efficient_attention unittest
* update variable_length_mem_eff_attn's docs and unittest
* update variable_length_mem_eff_attn's docs
* Update test_variable_length_memory_efficient_attention.py
* Update variable_length_memory_efficient_attention.cu
* fix codestyle
* fix variable_length_fmha's docs and unittest
* fix variable_length_fmha's docs

4036c937

09 8月, 2023 2 次提交
- N
  
  change index's dtype for int to int64 (#55949) · 8d181e37
  由 niuliling123 提交于 8月 09, 2023
  
  8d181e37
- H
  
  [XPU] add fused_softmax_mask and fused_softmax_mask_grad. (#55914) · b982af4a
  由 houj04 提交于 8月 09, 2023
  
  b982af4a
08 8月, 2023 1 次提交
- F
  
  optimize op structure (#55988) · 6bd7f860
  由 freeliuzc 提交于 8月 08, 2023
  
  6bd7f860
03 8月, 2023 2 次提交
- Y
  
  Optim fused linear grad add (#55927) · 91873469
  由 Yuang Liu 提交于 8月 03, 2023
  
  91873469
- W
  
  eliminate small pattern (#55843) · dc4b48f6
  由 wz1qqx 提交于 8月 03, 2023
  
  dc4b48f6
02 8月, 2023 1 次提交
- W
  
  [XPU]Add conv1d fuse pass (#55719) · 22c7a6eb
  由 wz1qqx 提交于 8月 02, 2023
  
  22c7a6eb
01 8月, 2023 1 次提交
- H
  
  [XPU] Add fast_where fusion op and XPU micro kernel (#55628) · 07e788f1
  由 hong19860320 提交于 8月 01, 2023
  
  07e788f1
26 7月, 2023 1 次提交
- T
  
  add sin and cos optional parameters to fused_rope op (#55415) · 581d05bb
  由 tianhaodongbd 提交于 7月 26, 2023
  
  581d05bb
24 7月, 2023 1 次提交
- H
  
  [PHI] add fused_softmax_mask and fused_softmax_mask_grad for CPU. (#55616) · b10b899c
  由 houj04 提交于 7月 24, 2023
  
  b10b899c
20 7月, 2023 1 次提交
- Z
  
  [XPU] fuse cast to conv2d/fc in mixed precision model (#54493) · 4df00939
  由 zhupengyang 提交于 7月 20, 2023
  
  4df00939
19 7月, 2023 1 次提交
- S
  Fix mea segmentation fault error (#55408) · cc262c55
  由 sneaxiy 提交于 7月 19, 2023
```
* fix mea seg fault develop

* fix bias_grad seg fault
```
  cc262c55
14 7月, 2023 2 次提交
- Z
  
  fix embedding_with_eltwise_add_xpu (#55354) · 95aab366
  由 zhupengyang 提交于 7月 14, 2023
  
  95aab366
- H
  
  [XPU] Fix yolo_box to support multi-stream based inference (#55310) · 7e4290c5
  由 hong19860320 提交于 7月 14, 2023
  
  7e4290c5
13 7月, 2023 2 次提交
- F
  [inference] Add FusedBiasActKernel (#55301) · 0a4d1999
  由 freeliuzc 提交于 7月 13, 2023
```
* add init value for CudaSwishFunctor

* add new phi kernel fusedBiasActKernel
```
  0a4d1999
- W
  
  fix conv_fusion in multi thread. (#55374) · ceb83562
  由 Wilber 提交于 7月 13, 2023
  
  ceb83562
12 7月, 2023 1 次提交

[ONEDNN] Upgrade oneDNN version to v3.1 (#52463) · cfa513f7

由 YangQun 提交于 7月 12, 2023

* squash pick the poc code
* fix build after rebase
* fix int8 conv and fc uts
* Fix and clean-up Get_SRC_Scale_Memory
* fix floating point fc uts
* fix test_analyzer_int8_googlenet
* test_analyzer_int8_mobilenetv1
* fix int8 mobilenet v2 and v3
* fix build error after rebase
* [oneDNN] rename library version
* fix conv bias datatype
* try to fix import error
* fix rebase error
* [oneDNN] pack library into python wheel
* add MKLDNN_SHARED_LIB_3 to env_dict
* fix test_analyzer_bert
* fix fill_constant op kernel
* fix ernie and matmul op ut
* fix softplus ut
* fix conv+relu6 fusion ut
* fix hardswish fusion
* fix quant+transpose fusion ut
* fixsgd ut
* fix int8 matmul with flatten
* fix fc+scale fusion
* fix conv/matmul+gelu fusion uts
* fix rebase error
* Revert "fix conv/matmul+gelu fusion uts"
This reverts commit 47eb5e49972bd8f7271a233def9bfb3e98ce78e1.
* upgrade to onednn v3.1
* remove older version onednn
* use densetensor::data() for achieving mean and var in layernorm impl
* comments for atol of integer tests
* fix clang-format
* Revert "remove older version onednn"
This reverts commit 783e57ddfd4401254596eae7d47adb9b03590c09.
* improve binary handle
* fix expand kernel
* Revert "use densetensor::data() for achieving mean and var in layernorm impl"
* always use forward_inference for conv
* remove activation scales
* rollback changes to mkldnn.cmake
* address comments
* port changes to dequantize kernel
* fix merge error
* fix fused_elementwise_kernel
* upgrade onednn version to v3.1.1
* fix some approval error
* fix error msg format
* remove old onednn libs
* try to fix symbolic link issue
* fix cinn test case segfault
* do not explicit link test with onednn
* remove unnecessary changes
* integrate CINN with onednn v3
* link with mkldnn project
* fix cinn build file

---------
Co-authored-by: NTomasz Socha <tomasz.socha@intel.com>
Co-authored-by: NChen, Xinyu1 <xinyu1.chen@intel.com>
Co-authored-by: Ntianshuo78520a <707759223@qq.com>

cfa513f7

07 7月, 2023 1 次提交
- W
  
  [XPU] Add layernorm fuse pass (#55154) · eb12739e
  由 wz1qqx 提交于 7月 07, 2023
  
  eb12739e
03 7月, 2023 2 次提交
- add linear_compress API (#54140) · c4d5ec66
  由 FormlessUnit 提交于 7月 03, 2023
```
* add linear_compress API
```
  c4d5ec66
- N
  
  Update the rope op according to the comments (#54985) · 2401d48d
  由 niuliling123 提交于 7月 03, 2023
  
  2401d48d
30 6月, 2023 1 次提交
- M
  
  [XPU] Add conv2d transpose fuse pass (#54904) · 12c15b89
  由 mjp9527 提交于 6月 30, 2023
  
  12c15b89
26 6月, 2023 1 次提交

remove ops from OpsWithFluidKernelNeedMoveToPhi set (#54007) · 733eca85

由 Sonder 提交于 6月 26, 2023

* remove ops from OpsWithFluidKernelNeedMoveToPhi set

* open static build flag

* OpsWithFluidKernelNeedMoveToPhi

* open new_executor_static_build

* add infermate for cudnn_lstm

* fix

* update

* fix

* update

* update

* update

* fix pow2 decay

* fix pow2 decay

* recover analysis_predictor.cc

* fix pow2 decay

* fix cudnn lstm

* add output register info for svd

* fix pow2_decay_with_linear_warmup_kernel

* recover test lstm cudnn

* recover svg register codes

* fix register info

* fix reduce sum register info

* add output info for adadelta

* add output info for adadelta

* add output info for adamax

* fix complex abs register info

* add register info for cudnn_lstm_grad

* recover

* fix lstm cudnn

* fix

* fix xpu output registe info

* remove std::cout

* add backend

* remove output info in pow2_decay_with_linear_warmup_kernel

* add judgment in TensorShouldBeFakeInitialized

* recover power_

* close new_executor_static_build

* fix set_value_xpu

733eca85

20 6月, 2023 1 次提交
- Y
  
  Remove reduntant definition of MPTypeTrait. (#54756) · f469f176
  由 Yiqun Liu 提交于 6月 20, 2023
  
  f469f176

PaddlePaddle / Paddle 大约 1 年 前同步成功

PaddlePaddle / Paddle
大约 1 年前同步成功