提交 · 6bd7f860d63fd0416090333fe9fca5737bcf990a · PaddlePaddle / Paddle

08 8月, 2023 1 次提交
- F
  
  optimize op structure (#55988) · 6bd7f860
  由 freeliuzc 提交于 8月 08, 2023
  
  6bd7f860
03 8月, 2023 2 次提交
- Y
  
  Optim fused linear grad add (#55927) · 91873469
  由 Yuang Liu 提交于 8月 03, 2023
  
  91873469
- W
  
  eliminate small pattern (#55843) · dc4b48f6
  由 wz1qqx 提交于 8月 03, 2023
  
  dc4b48f6
02 8月, 2023 1 次提交
- W
  
  [XPU]Add conv1d fuse pass (#55719) · 22c7a6eb
  由 wz1qqx 提交于 8月 02, 2023
  
  22c7a6eb
01 8月, 2023 1 次提交
- H
  
  [XPU] Add fast_where fusion op and XPU micro kernel (#55628) · 07e788f1
  由 hong19860320 提交于 8月 01, 2023
  
  07e788f1
26 7月, 2023 1 次提交
- T
  
  add sin and cos optional parameters to fused_rope op (#55415) · 581d05bb
  由 tianhaodongbd 提交于 7月 26, 2023
  
  581d05bb
24 7月, 2023 1 次提交
- H
  
  [PHI] add fused_softmax_mask and fused_softmax_mask_grad for CPU. (#55616) · b10b899c
  由 houj04 提交于 7月 24, 2023
  
  b10b899c
20 7月, 2023 1 次提交
- Z
  
  [XPU] fuse cast to conv2d/fc in mixed precision model (#54493) · 4df00939
  由 zhupengyang 提交于 7月 20, 2023
  
  4df00939
19 7月, 2023 1 次提交
- S
  Fix mea segmentation fault error (#55408) · cc262c55
  由 sneaxiy 提交于 7月 19, 2023
```
* fix mea seg fault develop

* fix bias_grad seg fault
```
  cc262c55
14 7月, 2023 2 次提交
- Z
  
  fix embedding_with_eltwise_add_xpu (#55354) · 95aab366
  由 zhupengyang 提交于 7月 14, 2023
  
  95aab366
- H
  
  [XPU] Fix yolo_box to support multi-stream based inference (#55310) · 7e4290c5
  由 hong19860320 提交于 7月 14, 2023
  
  7e4290c5
13 7月, 2023 2 次提交
- F
  [inference] Add FusedBiasActKernel (#55301) · 0a4d1999
  由 freeliuzc 提交于 7月 13, 2023
```
* add init value for CudaSwishFunctor

* add new phi kernel fusedBiasActKernel
```
  0a4d1999
- W
  
  fix conv_fusion in multi thread. (#55374) · ceb83562
  由 Wilber 提交于 7月 13, 2023
  
  ceb83562
12 7月, 2023 1 次提交

[ONEDNN] Upgrade oneDNN version to v3.1 (#52463) · cfa513f7

由 YangQun 提交于 7月 12, 2023

* squash pick the poc code
* fix build after rebase
* fix int8 conv and fc uts
* Fix and clean-up Get_SRC_Scale_Memory
* fix floating point fc uts
* fix test_analyzer_int8_googlenet
* test_analyzer_int8_mobilenetv1
* fix int8 mobilenet v2 and v3
* fix build error after rebase
* [oneDNN] rename library version
* fix conv bias datatype
* try to fix import error
* fix rebase error
* [oneDNN] pack library into python wheel
* add MKLDNN_SHARED_LIB_3 to env_dict
* fix test_analyzer_bert
* fix fill_constant op kernel
* fix ernie and matmul op ut
* fix softplus ut
* fix conv+relu6 fusion ut
* fix hardswish fusion
* fix quant+transpose fusion ut
* fixsgd ut
* fix int8 matmul with flatten
* fix fc+scale fusion
* fix conv/matmul+gelu fusion uts
* fix rebase error
* Revert "fix conv/matmul+gelu fusion uts"
This reverts commit 47eb5e49972bd8f7271a233def9bfb3e98ce78e1.
* upgrade to onednn v3.1
* remove older version onednn
* use densetensor::data() for achieving mean and var in layernorm impl
* comments for atol of integer tests
* fix clang-format
* Revert "remove older version onednn"
This reverts commit 783e57ddfd4401254596eae7d47adb9b03590c09.
* improve binary handle
* fix expand kernel
* Revert "use densetensor::data() for achieving mean and var in layernorm impl"
* always use forward_inference for conv
* remove activation scales
* rollback changes to mkldnn.cmake
* address comments
* port changes to dequantize kernel
* fix merge error
* fix fused_elementwise_kernel
* upgrade onednn version to v3.1.1
* fix some approval error
* fix error msg format
* remove old onednn libs
* try to fix symbolic link issue
* fix cinn test case segfault
* do not explicit link test with onednn
* remove unnecessary changes
* integrate CINN with onednn v3
* link with mkldnn project
* fix cinn build file

---------
Co-authored-by: NTomasz Socha <tomasz.socha@intel.com>
Co-authored-by: NChen, Xinyu1 <xinyu1.chen@intel.com>
Co-authored-by: Ntianshuo78520a <707759223@qq.com>

cfa513f7

07 7月, 2023 1 次提交
- W
  
  [XPU] Add layernorm fuse pass (#55154) · eb12739e
  由 wz1qqx 提交于 7月 07, 2023
  
  eb12739e
03 7月, 2023 2 次提交
- add linear_compress API (#54140) · c4d5ec66
  由 FormlessUnit 提交于 7月 03, 2023
```
* add linear_compress API
```
  c4d5ec66
- N
  
  Update the rope op according to the comments (#54985) · 2401d48d
  由 niuliling123 提交于 7月 03, 2023
  
  2401d48d
30 6月, 2023 1 次提交
- M
  
  [XPU] Add conv2d transpose fuse pass (#54904) · 12c15b89
  由 mjp9527 提交于 6月 30, 2023
  
  12c15b89
26 6月, 2023 1 次提交

remove ops from OpsWithFluidKernelNeedMoveToPhi set (#54007) · 733eca85

由 Sonder 提交于 6月 26, 2023

* remove ops from OpsWithFluidKernelNeedMoveToPhi set

* open static build flag

* OpsWithFluidKernelNeedMoveToPhi

* open new_executor_static_build

* add infermate for cudnn_lstm

* fix

* update

* fix

* update

* update

* update

* fix pow2 decay

* fix pow2 decay

* recover analysis_predictor.cc

* fix pow2 decay

* fix cudnn lstm

* add output register info for svd

* fix pow2_decay_with_linear_warmup_kernel

* recover test lstm cudnn

* recover svg register codes

* fix register info

* fix reduce sum register info

* add output info for adadelta

* add output info for adadelta

* add output info for adamax

* fix complex abs register info

* add register info for cudnn_lstm_grad

* recover

* fix lstm cudnn

* fix

* fix xpu output registe info

* remove std::cout

* add backend

* remove output info in pow2_decay_with_linear_warmup_kernel

* add judgment in TensorShouldBeFakeInitialized

* recover power_

* close new_executor_static_build

* fix set_value_xpu

733eca85

20 6月, 2023 1 次提交
- Y
  
  Remove reduntant definition of MPTypeTrait. (#54756) · f469f176
  由 Yiqun Liu 提交于 6月 20, 2023
  
  f469f176
12 6月, 2023 1 次提交
- Z
  [inference]conv_fusion support bias's rank equal to input's rank (#54477) · 03dbdbd1
  由 Zhang Jun 提交于 6月 12, 2023
```
* support bias's rank equal to input's rank
```
  03dbdbd1
02 6月, 2023 1 次提交
- W
  
  [XPU]Add yolo box fuse pass && kernel (#54163) · a087b9cb
  由 wz1qqx 提交于 6月 02, 2023
  
  a087b9cb
01 6月, 2023 1 次提交

Support static graph code generation for conv2d, conv3d, depthwise_conv2d (#54201) · f3eccb3f

由 huangjiyi 提交于 6月 01, 2023

* update

* update cmake

* update

* update

* update

* update

* Revert "update cmake"

This reverts commit 1e1dc1b2bc9967b725201272607f939260070fd4.

* update

* update

* update

* update

f3eccb3f

26 5月, 2023 1 次提交

[PHI Decoupling]Create PHI shared lib (#53735) · da50a009

由 YuanRisheng 提交于 5月 26, 2023

* create phi so

* fix ci bugs

* fix py3 bugs

* add file

* fix py3 bugs

* fix windows bugs

* perfect so

* fix py3 bugs

* delete all static target in phi

* fix windows bugs

* fix py3 bugs

* fix ci bugs

* fix windows bugs

* fix bugs: gflags can't be linked by dynamic and static lib

* fix bugs that can not load 3rd party

* fix ci bugs

* fix compile bugs

* fix py3 bugs

* fix conflict

* fix xpu bugs

* fix mac compile bugs

* fix psgpu bugs

* fix inference failed

* deal with conflict

* fix LIBRARY_PATH bug

* fix windows bugs

* fix onednn error

* fix windows compile bugs

* fix windows compile bugs

* fix test_cuda_graph_static_mode_error aborted

* fix windows bugs

* fix mac-python3 error

* fix hip compile bugs

* change mode to static

* change to static mode

* fix ci bugs

* fix py3 bugs

* fix windows bugs

* fix bugs

* add static flag

* add PADDLE_API

* change position of PADDLE_API

* fix windows bugs

* change mode to dynamic lib

* fix windows static bugs

* deal with conflict

* fix windows unit bug

* fix coverage

* deal with conflict

* fix windows-inference

* fix py3 bugs

* fix bugs when compile type_info

* fix compile bugs

* fix py3 bugs

* fix windows bugs

* fix windows openblas

* fix xpu bugs

* fix enforce_test in windows

* update code according comment

* fix windows cmake bug

* fix windows bugs

* fix windows bugs

* delete cinn unittest

* fix cinn bugs

---------
Co-authored-by: lzydev <1528794076@qq.com>

da50a009

24 5月, 2023 1 次提交
- W
  
  [XPU]Add act add fuse (#53965) · f55f9d79
  由 wz1qqx 提交于 5月 24, 2023
  
  f55f9d79
22 5月, 2023 1 次提交
- Z
  
  multi_encoder support adaptive seqlen (#53982) · 664a2753
  由 zhupengyang 提交于 5月 22, 2023
  
  664a2753
19 5月, 2023 2 次提交
- G
  
  test,test=develop (#53811) · 10758725
  由 Galaxy1458 提交于 5月 19, 2023
  
  10758725
- G
  
  test,test=develop (#53843) · c1f4005a
  由 Galaxy1458 提交于 5月 19, 2023
  
  c1f4005a
18 5月, 2023 2 次提交

Fused elementwises kernels and ops (#51427) · fb4a6ecf

由 Hulek 提交于 5月 18, 2023

* Fused elementwises kernels and ops

* change fuse pass name

* adjust .pbtxt files

* adjust quantization attributes

* add missing arguments and fix others, review fixed

* simplify fused kernel registration

* fix elementwise unit tests

* reuse one fused elementwise op

* adjust proto

* Add supported datatypes

* Change 'Scale' to 'scale' in tests, change some tests to onednn

* Revert breaking changes

* Fix unit tests

* Delete obsolete test cases

* Delete commented out code

* Fix codestyle

* delete temporary condition

* fix conflicts and delete duplicate fusing

* Fix code after merge

* Move tests to new directory

* fix tests volatility

* Rename test_elementwise_add_onednn_op.py to test_elementwise_add_mkldnn_op.py

* Update CMakeLists.txt add mkldnn op test

---------
Co-authored-by: NSilv3S <slawomir.siwek@intel.com>

fb4a6ecf

H

move fusion_group kernel to phi (#53781) · 26da689d
由 huangjiyi 提交于 5月 18, 2023

26da689d

16 5月, 2023 1 次提交

Move fused batchnorm to Phi (#53476) · 5e5481d8

由 Sonder 提交于 5月 16, 2023

* trans fused batch norm Compute function

* trans batch norm register info to phi

* trans fused batch norm grad Compute

* trans batch norm grad register info

* add sig file

* update sig file

* Update fused_bn_activation_kernel.cu

* Update fused_bn_activation_grad_kernel.cu

* fix

* Rename fused_bn_activation_kernel_grad.cu to fused_bn_activation_kernel.cu

* fix

* fix

* fix CudnnDataType error

* fix

* fix include

* update

* add #if

* add fused bn act to cmakelist.txt

* update  cmakelist

* fix #ifdef error

* add timeout set

* add env set

* fix

* fix

* Update fused_bn_activation_sig.cc

5e5481d8

08 5月, 2023 1 次提交
- W
  
  [XPU] Optimize fp16 xpu models (#53523) · 0a59825e
  由 wz1qqx 提交于 5月 08, 2023
  
  0a59825e
05 5月, 2023 2 次提交
- G
  remove some [-Wunused-parameter]warning (#53397) · 58435ae5
  由 Galaxy1458 提交于 5月 05, 2023
```
* test,test=develop

* test,test=develop

* test,test=develop

* test,test=develop

* test,test=develop

* test,test=develop

* test,test=develop

* test,test=develop
```
  58435ae5
- S
  
  [XPU] Fusion of gather and assign operators to fused_mt op for reducing memory usage (#53262) · 2039115c
  由 shentanyue 提交于 5月 05, 2023
  
  2039115c
28 4月, 2023 1 次提交

Dropout optimize & clean broadcast inT and ElementwiseType (#52969) · d611e48c

由 Bo Zhang 提交于 4月 28, 2023

* change judgement for DropoutGradGPUKernelDriver

* add UnrollerWithoutVecSize and after this Loaddata to be refined

* pass unittest

* use same unroller with XPU

* BroadcastWithInt64Index

* BroadcastDataLoader template partial specialization

* fix compile errs in ROCms

* clean ElementwiseT and InT for BroadcastKernel

* default axis and clean inT

* remove redundant fast divmod computation

* optimize drop_nd & drop_nd_grad

* optimize BroadcastDataLoader bf16 fp16

* rm InT etc. after merge develop

* delete constexpr for windows ci

* fix conflict

* fix conflic with develop

* fix conflic

* new clean

* clean

d611e48c

27 4月, 2023 1 次提交

Move fused feedforward (#53166) · 25b4ba7f

由 Sonder 提交于 4月 27, 2023

* trans fused_feedward Compute function to phi

* add register info

* remove maxfunctor

* move fused feedward to phi

* remove sig file

* remove fliud include

* add include

* add include

* add sig file

* add output register info

* fix sig file

* Update fused_feedforward_sig.cc

* fix grad kernel

* update output register info

* fix

* open fused_feedforward static build

* add optional and fix code style

* fix output info for fused attention

* add optional param

* merge

25b4ba7f

26 4月, 2023 1 次提交
- R
  Fix fused_attention_op and fused_feedforward_op bugs in xpu (#53318) · 1164626c
  由 Ruibiao Chen 提交于 4月 26, 2023
```
* Fix fused_attention_op and fused_feedforward_op bugs in xpu

* Fix d_x alloc errors for fused_feedforward_grad_kernel
```
  1164626c
24 4月, 2023 1 次提交

Move fused feedforward xpu (#53196) · 83c2e682

由 Sonder 提交于 4月 24, 2023

* add sig file

* trans fused feedforward compute function to phi

* remove fluid include

* delete old register info

* fix build error

* trans fused feedforward grad xpu to phi

83c2e682

21 4月, 2023 1 次提交
- Y
  
  Update for fused linear grad add. (#53118) · 63c83870
  由 Yuang Liu 提交于 4月 21, 2023
  
  63c83870
17 4月, 2023 1 次提交

[Paddle-Inference] Add cutlass conv2d_depthwise (#51792) · bd3b096a

由 zhoutianzi666 提交于 4月 17, 2023

* initial commit for cutlass_teller

* second commit for cutlass_teller

* add conv2d_depthwise python template

* add conv2d_depthwise cutlass template

* /zhoukangkang/paddle_cutlass/Paddle/paddle/fluid/framework/ir/cutlass_teller.h

* refine code in Conv2dFusionCanSupport

* add macro in cutlass_teller.h

* add 3x3 5x5 teller

* add groups not 1 or conv2d_depthwise teller

* 只生成ic是8的倍数的conv2d_depthwise 的kernel

* add EXPLICIT in cutlass_teller.h

* final commit

* add split_k_slices in conv2d_depthwise

* make stages == 2

* 重构部分代码

* add CutlassFusionType

* solve illegal memory

* make stride_h=stride_w && make dilation==1

* must check HasAttr(use_cutlass) before GetAttrIfExists

* add CONV2D_DEPTHWISE_BIAS_SILU to OpType2String

* modify decl.h and util.cu

bd3b096a

PaddlePaddle / Paddle 大约 1 年 前同步成功

PaddlePaddle / Paddle
大约 1 年前同步成功