提交 · 766fcdf09bd9afd3fc913987fbae54af20dda185 · PaddlePaddle / Paddle

12 7月, 2023 2 次提交

[ONEDNN] Upgrade oneDNN version to v3.1 (#52463) · cfa513f7

由 YangQun 提交于 7月 12, 2023

* squash pick the poc code
* fix build after rebase
* fix int8 conv and fc uts
* Fix and clean-up Get_SRC_Scale_Memory
* fix floating point fc uts
* fix test_analyzer_int8_googlenet
* test_analyzer_int8_mobilenetv1
* fix int8 mobilenet v2 and v3
* fix build error after rebase
* [oneDNN] rename library version
* fix conv bias datatype
* try to fix import error
* fix rebase error
* [oneDNN] pack library into python wheel
* add MKLDNN_SHARED_LIB_3 to env_dict
* fix test_analyzer_bert
* fix fill_constant op kernel
* fix ernie and matmul op ut
* fix softplus ut
* fix conv+relu6 fusion ut
* fix hardswish fusion
* fix quant+transpose fusion ut
* fixsgd ut
* fix int8 matmul with flatten
* fix fc+scale fusion
* fix conv/matmul+gelu fusion uts
* fix rebase error
* Revert "fix conv/matmul+gelu fusion uts"
This reverts commit 47eb5e49972bd8f7271a233def9bfb3e98ce78e1.
* upgrade to onednn v3.1
* remove older version onednn
* use densetensor::data() for achieving mean and var in layernorm impl
* comments for atol of integer tests
* fix clang-format
* Revert "remove older version onednn"
This reverts commit 783e57ddfd4401254596eae7d47adb9b03590c09.
* improve binary handle
* fix expand kernel
* Revert "use densetensor::data() for achieving mean and var in layernorm impl"
* always use forward_inference for conv
* remove activation scales
* rollback changes to mkldnn.cmake
* address comments
* port changes to dequantize kernel
* fix merge error
* fix fused_elementwise_kernel
* upgrade onednn version to v3.1.1
* fix some approval error
* fix error msg format
* remove old onednn libs
* try to fix symbolic link issue
* fix cinn test case segfault
* do not explicit link test with onednn
* remove unnecessary changes
* integrate CINN with onednn v3
* link with mkldnn project
* fix cinn build file

---------
Co-authored-by: NTomasz Socha <tomasz.socha@intel.com>
Co-authored-by: NChen, Xinyu1 <xinyu1.chen@intel.com>
Co-authored-by: Ntianshuo78520a <707759223@qq.com>

cfa513f7

[clang-tidy] enable `readability-container-size-empty` check (#55279) · be3a6fa7

由 Wang Xin 提交于 7月 12, 2023

* [clang-tidy] enable readability-container-size-empty check

* fix test_custom_kernel Failed

* add clang-tid-10 in dockerfile

* add clang-tidy in dockerfile

* fix bug

be3a6fa7

07 7月, 2023 2 次提交
- W
  
  [XPU] Add layernorm fuse pass (#55154) · eb12739e
  由 wz1qqx 提交于 7月 07, 2023
  
  eb12739e
- R
  
  [CustomDevice] fix resource_pool release bug (#55229) · 6af85a81
  由 ronnywang 提交于 7月 07, 2023
  
  6af85a81
04 7月, 2023 1 次提交

[XPU] Add XPU plugin support (#55101) · 6d5d9f23

由 hong19860320 提交于 7月 04, 2023

* Add XPU plugin to support the customized ops or improve the performance of the fusion ops based on hand-written xpu micro kernels.

* refine README.md

6d5d9f23

03 7月, 2023 2 次提交
- J
  [XPU] Fix the topk, set_value ops that using temporary tensors avoiding the... · cc2059a0
  由 jiangfan06 提交于 7月 03, 2023
```
[XPU] Fix the topk, set_value ops that using temporary tensors avoiding the memory overlaps during multi-stream inference (#54851)
```
  cc2059a0
- R
  [CustomDevice] release device manager in py::atexit (#54932) · e5725680
  由 ronnywang 提交于 7月 03, 2023
```
* [CustomDevice] release device manager in py::atexit

* fix hip_version macro

* update

* update
```
  e5725680
30 6月, 2023 1 次提交
- M
  
  [XPU] Add conv2d transpose fuse pass (#54904) · 12c15b89
  由 mjp9527 提交于 6月 30, 2023
  
  12c15b89
28 6月, 2023 1 次提交
- R
  [ROCM] fix cupti, rccl on rocm (#54807) · 57da105c
  由 ronnywang 提交于 6月 28, 2023
```
* [ROCM] fix cupti, hipcub

* update

* update
```
  57da105c
26 6月, 2023 1 次提交
- X
  [XPU] support xpu runtime profiler: follow up (#54690) · 9c3f4b13
  由 XiaociZhang 提交于 6月 26, 2023
```
* [XPU] support xpu runtime profiler: follow up

* fix compile issue
```
  9c3f4b13
20 6月, 2023 2 次提交

[XPU][PHI Kernels] add unique kernel for xpu (#54758) · f836e7d2

由 lijin23 提交于 6月 20, 2023

* add unique kernel for xpu

* add unique kernel for xpu

* update uniittest

* add xpu support for unique with axis

f836e7d2

[XPU] avoid compile issue in non-xpu env (#54711) · e2690526

由 XiaociZhang 提交于 6月 20, 2023

* [kunlun] avoid compile issue in non-xpu env

also rename macro WITH_XPU_XPTI to WITH_XPTI

* move get_xpti_dependency.sh to tools/xpu

* move get_xpti_dependency.sh to tools/xpu

* call get_xpti_dependency.sh only in need

e2690526

19 6月, 2023 1 次提交
- A
  
  [XPU] add context_gm_size in XpuConfig, don't alloc gm in pass. (#54674) · 52ad918b
  由 AlbertVan 提交于 6月 19, 2023
  
  52ad918b
16 6月, 2023 1 次提交

[kunlun] support xpu runtime profiler (#54685) · 82eeda69

由 jameszhang 提交于 6月 16, 2023

* [kunlun] support xpu runtime profiler

* fix cmake error

* add libxpti.so to paddle package

* fix for style check

* sync change in setup.py and python/setup.py.in

* remove libxpti.so from paddle output dir in this PR

82eeda69

15 6月, 2023 2 次提交
- D
  
  add uint8 custom ccltype (#54671) · 6fc0378a
  由 duanyanhui 提交于 6月 15, 2023
  
  6fc0378a
- R
  
  [CustomDevice] add MOE support, PART1 (#54572) · 20db8602
  由 ronnywang 提交于 6月 15, 2023
  
  20db8602
14 6月, 2023 1 次提交
- Z
  
  set xpu context at runtime (#54587) · d0d7d01f
  由 zhupengyang 提交于 6月 14, 2023
  
  d0d7d01f
09 6月, 2023 1 次提交
- H
  
  [XPU] add registration of SplitWithNumKernel with int64. (#54478) · 4a77cf53
  由 houj04 提交于 6月 09, 2023
  
  4a77cf53
08 6月, 2023 2 次提交
- W
  
  [XPU]add fp16 kernels (#54410) · fd9c555c
  由 wz1qqx 提交于 6月 08, 2023
  
  fd9c555c
- Y
  
  xpu support auto growth allocator (#54121) · 168fac13
  由 ykkk2333 提交于 6月 08, 2023
  
  168fac13
02 6月, 2023 2 次提交
- Y
  [BugFix]Fix TypeInfo errors in MacOS (#54279) · 4f56e7c2
  由 YuanRisheng 提交于 6月 02, 2023
```
* fix mac typeinfo bugs

* add file

* move code to cc

* fix compile bugs
```
  4f56e7c2
- W
  
  [XPU]Add yolo box fuse pass && kernel (#54163) · a087b9cb
  由 wz1qqx 提交于 6月 02, 2023
  
  a087b9cb
01 6月, 2023 1 次提交
- Y
  
  fix xpu-kp bugs (#54234) · e8735ddf
  由 YuanRisheng 提交于 6月 01, 2023
  
  e8735ddf
26 5月, 2023 1 次提交

[PHI Decoupling]Create PHI shared lib (#53735) · da50a009

由 YuanRisheng 提交于 5月 26, 2023

* create phi so

* fix ci bugs

* fix py3 bugs

* add file

* fix py3 bugs

* fix windows bugs

* perfect so

* fix py3 bugs

* delete all static target in phi

* fix windows bugs

* fix py3 bugs

* fix ci bugs

* fix windows bugs

* fix bugs: gflags can't be linked by dynamic and static lib

* fix bugs that can not load 3rd party

* fix ci bugs

* fix compile bugs

* fix py3 bugs

* fix conflict

* fix xpu bugs

* fix mac compile bugs

* fix psgpu bugs

* fix inference failed

* deal with conflict

* fix LIBRARY_PATH bug

* fix windows bugs

* fix onednn error

* fix windows compile bugs

* fix windows compile bugs

* fix test_cuda_graph_static_mode_error aborted

* fix windows bugs

* fix mac-python3 error

* fix hip compile bugs

* change mode to static

* change to static mode

* fix ci bugs

* fix py3 bugs

* fix windows bugs

* fix bugs

* add static flag

* add PADDLE_API

* change position of PADDLE_API

* fix windows bugs

* change mode to dynamic lib

* fix windows static bugs

* deal with conflict

* fix windows unit bug

* fix coverage

* deal with conflict

* fix windows-inference

* fix py3 bugs

* fix bugs when compile type_info

* fix compile bugs

* fix py3 bugs

* fix windows bugs

* fix windows openblas

* fix xpu bugs

* fix enforce_test in windows

* update code according comment

* fix windows cmake bug

* fix windows bugs

* fix windows bugs

* delete cinn unittest

* fix cinn bugs

---------
Co-authored-by: lzydev <1528794076@qq.com>

da50a009

25 5月, 2023 1 次提交
- [Zero-Dim] support ReshapeTransform/nll_loss/matmul support 0D (#53828) · a64a722a
  由 zhouweiwei2014 提交于 5月 25, 2023
  
  a64a722a
24 5月, 2023 3 次提交

Try to increase the repeat of autotune and fix the setting of allow_tf32_cublas. (#53622) · f4abe34b

由 Yiqun Liu 提交于 5月 24, 2023

* Try to increase the repeat of autotune and fix the setting of allow_tf32_cublas.

* Change the repeat of cublaslt to 10.

* Use FLAGS_cublaslt_exhaustive_search_times as repeats.

* Fix compiling error on CI.

* Polish the key and simplify codes.

f4abe34b

W

[XPU]Add act add fuse (#53965) · f55f9d79
由 wz1qqx 提交于 5月 24, 2023

f55f9d79

[XPU][PHI Kernels] bind bitwise_add kernel & add int32/int64 support to... · 0a06140f

由 lijin23 提交于 5月 24, 2023

[XPU][PHI Kernels] bind bitwise_add kernel & add int32/int64 support to scatter_nd_add kernel for xpu (#54066)

* bind new kernels to xpu

* refine code

* fix bugs in unittest

0a06140f

23 5月, 2023 2 次提交
- R
  
  [PHI] bind nll_loss xpu kernel (#54043) · 73d706ce
  由 RuohengMa 提交于 5月 23, 2023
  
  73d706ce
- C
  
  Fix typos (#54015) · adca3654
  由 co63oc 提交于 5月 23, 2023
  
  adca3654
22 5月, 2023 2 次提交
- Z
  
  [xpu][infer] support runtime configs (#53595) · e135069d
  由 zhupengyang 提交于 5月 22, 2023
  
  e135069d
- Z
  
  [XPU] batch_norm_grad support float16 for xpu (#53977) · 934d8b89
  由 zhangyikun02 提交于 5月 22, 2023
  
  934d8b89
19 5月, 2023 2 次提交

W

[XPU] fix fallback (#53801) · 4b85e5db
由 wz1qqx 提交于 5月 19, 2023

4b85e5db

Add flash attention to speedup fused_gate_attention. (#52731) · d29c1f8e

由 limingshu 提交于 5月 19, 2023

* Reorganize the forward codes of flash-attention.

* Fix forward.

* Remove some noused codes.

* Simplify codes and fix backward.

* Change all LOG(INFO) to VLOG and fix the backward.

* add scale for AF2 flash_attn, much thanks to xreki and shaojie for debug these codes

* decrease the effect of debug print on performance

* Unify the initialize of flashattn arguments.

* Rewirte the reshape of temp_mask and temp_bias.

* API support use_flash_attn.

* Fix compiling error on CI.

* Try to crop the flash-attention lib.

* Correct the condition of whether can use flash-attn.

* Remove the softmax_out argument.

* Remove is_causal.

* Polish codes.

* Fix qkv_transpose_out's shape and scaling of Q * K.

* Update commit of flash-attention.

---------
Co-authored-by: NLiu Yiqun <liuyiqun01@baidu.com>

d29c1f8e

18 5月, 2023 2 次提交
- H
  
  move fusion_group kernel to phi (#53781) · 26da689d
  由 huangjiyi 提交于 5月 18, 2023
  
  26da689d
- C
  
  Add segment_pool tests (#53785) · 0bed2203
  由 co63oc 提交于 5月 18, 2023
  
  0bed2203
15 5月, 2023 2 次提交
- G
  remove some [-Wunsed-parameter] warning (#53689) · 3e1fffea
  由 Galaxy1458 提交于 5月 15, 2023
```
* test,test=develop

* test,test=develop

* test,test=develop

* test,test=develop

* test,test=develop

* test,test=develop
```
  3e1fffea
- R
  
  [XPU][PHI] bind index_sample_grad xpu kernel (#53753) · 81056073
  由 RuohengMa 提交于 5月 15, 2023
  
  81056073
12 5月, 2023 2 次提交
- R
  [CustomDevice] add inference MP support, PART0 (#53719) · d03bbefa
  由 ronnywang 提交于 5月 12, 2023
```
* [CustomDevice] add inference MP support, PART0

* update
```
  d03bbefa
- R
  
  [PHI] update xpu api version; bind reduce_any_bool xpu kernel; remove unnecessary header (#53716) · 0603777b
  由 RuohengMa 提交于 5月 12, 2023
  
  0603777b

PaddlePaddle / Paddle 大约 1 年 前同步成功

PaddlePaddle / Paddle
大约 1 年前同步成功