提交 · 2d98758c1b03c7670de95051930e278d981e15ac · PaddlePaddle / Paddle

13 7月, 2023 1 次提交
- R
  Add matmul_int8 op (#55228) · 27cc0df5
  由 RichardWooSJTU 提交于 7月 13, 2023
```
* add matmul int8
```
  27cc0df5
12 7月, 2023 2 次提交

[ONEDNN] Upgrade oneDNN version to v3.1 (#52463) · cfa513f7

由 YangQun 提交于 7月 12, 2023

* squash pick the poc code
* fix build after rebase
* fix int8 conv and fc uts
* Fix and clean-up Get_SRC_Scale_Memory
* fix floating point fc uts
* fix test_analyzer_int8_googlenet
* test_analyzer_int8_mobilenetv1
* fix int8 mobilenet v2 and v3
* fix build error after rebase
* [oneDNN] rename library version
* fix conv bias datatype
* try to fix import error
* fix rebase error
* [oneDNN] pack library into python wheel
* add MKLDNN_SHARED_LIB_3 to env_dict
* fix test_analyzer_bert
* fix fill_constant op kernel
* fix ernie and matmul op ut
* fix softplus ut
* fix conv+relu6 fusion ut
* fix hardswish fusion
* fix quant+transpose fusion ut
* fixsgd ut
* fix int8 matmul with flatten
* fix fc+scale fusion
* fix conv/matmul+gelu fusion uts
* fix rebase error
* Revert "fix conv/matmul+gelu fusion uts"
This reverts commit 47eb5e49972bd8f7271a233def9bfb3e98ce78e1.
* upgrade to onednn v3.1
* remove older version onednn
* use densetensor::data() for achieving mean and var in layernorm impl
* comments for atol of integer tests
* fix clang-format
* Revert "remove older version onednn"
This reverts commit 783e57ddfd4401254596eae7d47adb9b03590c09.
* improve binary handle
* fix expand kernel
* Revert "use densetensor::data() for achieving mean and var in layernorm impl"
* always use forward_inference for conv
* remove activation scales
* rollback changes to mkldnn.cmake
* address comments
* port changes to dequantize kernel
* fix merge error
* fix fused_elementwise_kernel
* upgrade onednn version to v3.1.1
* fix some approval error
* fix error msg format
* remove old onednn libs
* try to fix symbolic link issue
* fix cinn test case segfault
* do not explicit link test with onednn
* remove unnecessary changes
* integrate CINN with onednn v3
* link with mkldnn project
* fix cinn build file

---------
Co-authored-by: NTomasz Socha <tomasz.socha@intel.com>
Co-authored-by: NChen, Xinyu1 <xinyu1.chen@intel.com>
Co-authored-by: Ntianshuo78520a <707759223@qq.com>

cfa513f7

[clang-tidy] enable `readability-container-size-empty` check (#55279) · be3a6fa7

由 Wang Xin 提交于 7月 12, 2023

* [clang-tidy] enable readability-container-size-empty check

* fix test_custom_kernel Failed

* add clang-tid-10 in dockerfile

* add clang-tidy in dockerfile

* fix bug

be3a6fa7

29 6月, 2023 1 次提交
- Y
  Fix compiling on XPU related to MPTypeTrait. (#54924) · 7353e9e9
  由 Yiqun Liu 提交于 6月 29, 2023
```
* Fix compiling on XPU related to MPTypeTrait.

* Unify the use of MPTypeTrait.

* Fix compiling error.
```
  7353e9e9
20 6月, 2023 1 次提交

static graph autogen code support for matmul op (#54338) · ad80fbfe

由 Wang Xin 提交于 6月 20, 2023

* static graph autogen code support for matmul op

* fix bug

* fix bug

* fix bug

* fix bug

* fix bug

* fix bug

* fix bug

ad80fbfe

12 6月, 2023 1 次提交
- R
  
  fix gcc12 error (#54535) · 89bcf894
  由 risemeup1 提交于 6月 12, 2023
  
  89bcf894
10 6月, 2023 1 次提交
- L
  
  Fix bugs in fused_linear_epilogue (#54512) · 0a704e14
  由 limingshu 提交于 6月 10, 2023
  
  0a704e14
08 6月, 2023 1 次提交
- C
  
  fuse vit attention for faster-rcnn on BML (#54139) · fc880209
  由 cmeng 提交于 6月 08, 2023
  
  fc880209
05 6月, 2023 1 次提交
- H
  
  Fix some compile errors with C++17 (#54282) · 68d81d0e
  由 huangjiyi 提交于 6月 05, 2023
  
  68d81d0e
01 6月, 2023 2 次提交

Support static graph code generation for conv2d, conv3d, depthwise_conv2d (#54201) · f3eccb3f

由 huangjiyi 提交于 6月 01, 2023

* update

* update cmake

* update

* update

* update

* update

* Revert "update cmake"

This reverts commit 1e1dc1b2bc9967b725201272607f939260070fd4.

* update

* update

* update

* update

f3eccb3f

mv all unittests test (#53235) · b0e86d55

由 tianshuo78520a 提交于 6月 01, 2023

* mv all unittests test

* fix error

* fix error

* fix

* fix

* del unittests

* fix paddle_build.sh

* fix

* fix test

* fix add test

* fix

* fix

* fix

* merge develop

* fix

* fix

* fix

* fix

* fix

* merge develop

* fix test_async_read_write

* fix test_async_read_write

* merge develop

* fix

* fix import legacy_test

* fix

* fix

* fix

* fix

* fix

* fix

* fix

* fix

* fix bug

* fix

* fix coverage test bug

* fix

* fix

* fix

* fix

* fix

* fix code sstyle

* fix code

* fix code

* fix

* fix

* fix

* del test_sequence_enumerate_op.py

* fix

b0e86d55

24 5月, 2023 1 次提交

Try to increase the repeat of autotune and fix the setting of allow_tf32_cublas. (#53622) · f4abe34b

由 Yiqun Liu 提交于 5月 24, 2023

* Try to increase the repeat of autotune and fix the setting of allow_tf32_cublas.

* Change the repeat of cublaslt to 10.

* Use FLAGS_cublaslt_exhaustive_search_times as repeats.

* Fix compiling error on CI.

* Polish the key and simplify codes.

f4abe34b

23 5月, 2023 2 次提交
- C
  
  fix typos(#53967) · c36a000d
  由 cyberslack_lee 提交于 5月 23, 2023
  
  c36a000d
- H
  move fusion_group infershape to phi (#53934) · 3dc99088
  由 huangjiyi 提交于 5月 23, 2023
```
* update

* update

* update

* set out dtype
```
  3dc99088
19 5月, 2023 1 次提交

Add flash attention to speedup fused_gate_attention. (#52731) · d29c1f8e

由 limingshu 提交于 5月 19, 2023

* Reorganize the forward codes of flash-attention.

* Fix forward.

* Remove some noused codes.

* Simplify codes and fix backward.

* Change all LOG(INFO) to VLOG and fix the backward.

* add scale for AF2 flash_attn, much thanks to xreki and shaojie for debug these codes

* decrease the effect of debug print on performance

* Unify the initialize of flashattn arguments.

* Rewirte the reshape of temp_mask and temp_bias.

* API support use_flash_attn.

* Fix compiling error on CI.

* Try to crop the flash-attention lib.

* Correct the condition of whether can use flash-attn.

* Remove the softmax_out argument.

* Remove is_causal.

* Polish codes.

* Fix qkv_transpose_out's shape and scaling of Q * K.

* Update commit of flash-attention.

---------
Co-authored-by: NLiu Yiqun <liuyiqun01@baidu.com>

d29c1f8e

18 5月, 2023 2 次提交

Fused elementwises kernels and ops (#51427) · fb4a6ecf

由 Hulek 提交于 5月 18, 2023

* Fused elementwises kernels and ops

* change fuse pass name

* adjust .pbtxt files

* adjust quantization attributes

* add missing arguments and fix others, review fixed

* simplify fused kernel registration

* fix elementwise unit tests

* reuse one fused elementwise op

* adjust proto

* Add supported datatypes

* Change 'Scale' to 'scale' in tests, change some tests to onednn

* Revert breaking changes

* Fix unit tests

* Delete obsolete test cases

* Delete commented out code

* Fix codestyle

* delete temporary condition

* fix conflicts and delete duplicate fusing

* Fix code after merge

* Move tests to new directory

* fix tests volatility

* Rename test_elementwise_add_onednn_op.py to test_elementwise_add_mkldnn_op.py

* Update CMakeLists.txt add mkldnn op test

---------
Co-authored-by: NSilv3S <slawomir.siwek@intel.com>

fb4a6ecf

H

move fusion_group kernel to phi (#53781) · 26da689d
由 huangjiyi 提交于 5月 18, 2023

26da689d

16 5月, 2023 2 次提交

G
remove some [-Wunused-parameter] warning and fix a file to pass cpplint (#53814) · 10a38b4e
由 Galaxy1458 提交于 5月 16, 2023
```
* test,test=develop

* test,test=develop

* test,test=develop

* test,test=develop

* test,test=develop
```
10a38b4e

Move fused batchnorm to Phi (#53476) · 5e5481d8

由 Sonder 提交于 5月 16, 2023

* trans fused batch norm Compute function

* trans batch norm register info to phi

* trans fused batch norm grad Compute

* trans batch norm grad register info

* add sig file

* update sig file

* Update fused_bn_activation_kernel.cu

* Update fused_bn_activation_grad_kernel.cu

* fix

* Rename fused_bn_activation_kernel_grad.cu to fused_bn_activation_kernel.cu

* fix

* fix

* fix CudnnDataType error

* fix

* fix include

* update

* add #if

* add fused bn act to cmakelist.txt

* update  cmakelist

* fix #ifdef error

* add timeout set

* add env set

* fix

* fix

* Update fused_bn_activation_sig.cc

5e5481d8

15 5月, 2023 1 次提交

remove some [-Wunused-paramter]warning (#53681) · 96188fc1

由 Galaxy1458 提交于 5月 15, 2023

* test,test=develop

* test,test=develop

* test,test=develop

* test,test=develop

* test,test=develop

* test,test=develop

96188fc1

11 5月, 2023 1 次提交
- H
  [XPU] update log for bkcl function calls. (#53609) · d67d74cc
  由 houj04 提交于 5月 11, 2023
```
* [XPU] update log for bkcl function calls.

* minor update

* revert unnecessary modifications.
```
  d67d74cc
09 5月, 2023 1 次提交
- G
  remove some [-Wunused-parameter]warning (#53617) · bafc3469
  由 Galaxy1458 提交于 5月 09, 2023
```
* test,test=develop

* test,test=develop

* test,test=develop

* test,test=develop
```
  bafc3469
05 5月, 2023 1 次提交
- G
  
  [test]mv fluid op fused to test/cpp/fluid/fused (#53434) · 903c5638
  由 gouzil 提交于 5月 05, 2023
  
  903c5638
28 4月, 2023 1 次提交

Dropout optimize & clean broadcast inT and ElementwiseType (#52969) · d611e48c

由 Bo Zhang 提交于 4月 28, 2023

* change judgement for DropoutGradGPUKernelDriver

* add UnrollerWithoutVecSize and after this Loaddata to be refined

* pass unittest

* use same unroller with XPU

* BroadcastWithInt64Index

* BroadcastDataLoader template partial specialization

* fix compile errs in ROCms

* clean ElementwiseT and InT for BroadcastKernel

* default axis and clean inT

* remove redundant fast divmod computation

* optimize drop_nd & drop_nd_grad

* optimize BroadcastDataLoader bf16 fp16

* rm InT etc. after merge develop

* delete constexpr for windows ci

* fix conflict

* fix conflic with develop

* fix conflic

* new clean

* clean

d611e48c

27 4月, 2023 2 次提交

Move fused feedforward (#53166) · 25b4ba7f

由 Sonder 提交于 4月 27, 2023

* trans fused_feedward Compute function to phi

* add register info

* remove maxfunctor

* move fused feedward to phi

* remove sig file

* remove fliud include

* add include

* add include

* add sig file

* add output register info

* fix sig file

* Update fused_feedforward_sig.cc

* fix grad kernel

* update output register info

* fix

* open fused_feedforward static build

* add optional and fix code style

* fix output info for fused attention

* add optional param

* merge

25b4ba7f

H
Register fluid xpu kerenls to phi [part 2] (#53188) · eee9c788
由 huangjiyi 提交于 4月 27, 2023
```
* update

* fix bug
```
eee9c788

26 4月, 2023 1 次提交
- H
  Register fluid xpu kerenls to phi [part 3] (#53189) · 37489df5
  由 huangjiyi 提交于 4月 26, 2023
```
* update

* update
```
  37489df5
25 4月, 2023 1 次提交

[PHI]Add flags macro for PHI (#52991) · 22e96bde

由 YuanRisheng 提交于 4月 25, 2023

* add flags for phi

* fix compile bugs

* fix ci bugs

* fix inference bugs

* fix cinn' bugs

* fix cinn bugs

* perfect code according comment

* fix ci bugs

* fix ci bugs

22e96bde

24 4月, 2023 1 次提交

Move fused feedforward xpu (#53196) · 83c2e682

由 Sonder 提交于 4月 24, 2023

* add sig file

* trans fused feedforward compute function to phi

* remove fluid include

* delete old register info

* fix build error

* trans fused feedforward grad xpu to phi

83c2e682

19 4月, 2023 4 次提交

Move fused_attention op to phi [迁移XPU OpKernel] [ test=kunlun ] (#53011) · 7b56bd25

由 Sonder 提交于 4月 19, 2023

* trans fused attention to phi

* add optional parm

* trans fused_attention_grad to phi

* add fused attention grad register info

* fix include

* test=kunlun

* add fused attention to static build list

* add remove

* update remove

7b56bd25

H
Register fluid kerenls to phi [part 11] (#53035) · abc44b40
由 huangjiyi 提交于 4月 19, 2023
```
* update

* fix bug

* fix bug

* fix bug

* fix bug
```
abc44b40

Support Linear operation in cuBlaslt and plug into attn_gemm and fusedLinear backward op (#52028) · f6f18835

由 limingshu 提交于 4月 19, 2023

* first commit

* restruct c++ interface to divide linear from matmulwithcublaslt

* finish building in cublaslt impl

* fix code bugs

* fix host cost

* add some changes

f6f18835

H
Register fluid kerenls to phi [part 13] (#53037) · d9edb233
由 huangjiyi 提交于 4月 19, 2023
```
* update

* fix bug

* update

* fix bug
```
d9edb233

18 4月, 2023 1 次提交
- H
  register fluid kerenls to phi [part 6.5] (#52882) · cb81befa
  由 huangjiyi 提交于 4月 18, 2023
```
* update

* fix bug

* update

* fix bug
```
  cb81befa
17 4月, 2023 1 次提交
- Y
  [PHI]Unify fluid kernel (Part4) (#52626) · 1b5eba8a
  由 YuanRisheng 提交于 4月 17, 2023
```
* unify kernel

* fix ci bugs

* fix py3 bugs

* fix py3 bugs

* perfect code
```
  1b5eba8a
14 4月, 2023 1 次提交

Move fused_attention op to phi [迁移反向 GPU OpKernel] (#51909) · 3bac6264

由 Sonder 提交于 4月 14, 2023

* add kernel functions

* update kernel functions

* update func parameters' name

* create codes for gpu device

* 调整文件位置

* fix include error

* remove dependent files to phi/

* restore fused_attention_op.cu

* fix dependence errors

* fix dependence errors

* fix include error

* fix all depandence errors[build success]

* remove useless include

* recover useless include

* use phi::ToNCCLDataType

* fix namespace

* update new register code

* fix error in fused_gemm_epilogue_utils

* fix error in FusedAttentionKernel parm

* finish fused_attention registe code[build success]

* add paddle::optional

* add sig file

* fix build error

* fix a include error

* 恢复正向代码

* update CMkaeList

* trans Compute function to phi [build success]

* add register code and fix include error [build success]

* fix parameter sequence

* add include file

* update #if before include

* update #if before include

* fix grammly error

* update codes for DropoutParam

* remove const cast

* trans some fluid api to phi api

* remove const cast

* trans some fluid api to phi api

* add #if

* update test code

* update test codes

* recover test codes

* fix namespace and remove fluid include

* recover random seed

* remove fluid quant_helper

* fix include error

* include utils in funcs

* change include file

* move grad codes back to fluid floder

* move grad codes back to fluid floder

* fix sig file error

* update include

* recover codes to develop

* update register codes

* fix build error

* recover fluid include

* remove some fluid include

* remove some fluid include

* Update fused_attention_op.cu

* remove fluid include

* add some fluid include

* Update fused_attention_op.cu

* Update fused_attention_op.cu

* Update fused_attention_op.cu

* Update fused_attention_op.cu

* remote useless include

3bac6264

10 4月, 2023 1 次提交
- R
  
  fix gcc12 error (#52646) · 66a4804b
  由 risemeup1 提交于 4月 10, 2023
  
  66a4804b
06 4月, 2023 1 次提交

Move fused_attention op to phi [迁移前向 GPU OpKernel] (#51743) · a7ec8958

由 Sonder 提交于 4月 06, 2023

* add kernel functions

* update kernel functions

* update func parameters' name

* create codes for gpu device

* 调整文件位置

* fix include error

* remove dependent files to phi/

* restore fused_attention_op.cu

* fix dependence errors

* fix dependence errors

* fix include error

* fix all depandence errors[build success]

* remove useless include

* recover useless include

* use phi::ToNCCLDataType

* fix namespace

* update new register code

* fix error in fused_gemm_epilogue_utils

* fix error in FusedAttentionKernel parm

* finish fused_attention registe code[build success]

* add paddle::optional

* add sig file

* fix build error

* fix a include error

* update CMkaeList

* fix parameter sequence

* add include file

* update #if before include

* fix grammly error

* update codes for DropoutParam

* remove const cast

* trans some fluid api to phi api

* add #if

* update test code

* update test codes

* recover test codes

* trans fused_attention to fluid

* move #endif to end

* move #endif

* delete useless files

* use fused attention utils and recover random seed

* remove fluid include in phi

a7ec8958

04 4月, 2023 1 次提交
- H
  register fluid kerenls to phi [part 5] (#52486) · eb38c85f
  由 huangjiyi 提交于 4月 04, 2023
```
* update

* fix bug

* update

* fix bug
```
  eb38c85f
30 3月, 2023 1 次提交

Speedup worker (#51760) · 8ca86d72

由 pangengzheng 提交于 3月 30, 2023

* support run haokanctr model in heterps-models

* polish setup.py

* polish JVM_LIB in evn_dict

* align infer auc with DistPsArch pre-stable

* async and multi thread data feed

* rewrite dense tensor intialization

* async infer shape and reuse memory

8ca86d72

PaddlePaddle / Paddle 大约 1 年 前同步成功

PaddlePaddle / Paddle
大约 1 年前同步成功