提交 · 1c858591eafe233cbf164a2a2d10f1b11c550ad3 · PaddlePaddle / Paddle

14 7月, 2023 1 次提交

Update CUDNN Frontend API to v0.9.1 (#54949) · 76b77d81

由 Tian Zheng 提交于 7月 14, 2023

* Update CUDNN Frontend API to v0.9.1
- Remove old patches
- Remove workarounds that are no longer needed

* Fix test_switch_autotune

76b77d81

26 5月, 2023 1 次提交

[PHI Decoupling]Create PHI shared lib (#53735) · da50a009

由 YuanRisheng 提交于 5月 26, 2023

* create phi so

* fix ci bugs

* fix py3 bugs

* add file

* fix py3 bugs

* fix windows bugs

* perfect so

* fix py3 bugs

* delete all static target in phi

* fix windows bugs

* fix py3 bugs

* fix ci bugs

* fix windows bugs

* fix bugs: gflags can't be linked by dynamic and static lib

* fix bugs that can not load 3rd party

* fix ci bugs

* fix compile bugs

* fix py3 bugs

* fix conflict

* fix xpu bugs

* fix mac compile bugs

* fix psgpu bugs

* fix inference failed

* deal with conflict

* fix LIBRARY_PATH bug

* fix windows bugs

* fix onednn error

* fix windows compile bugs

* fix windows compile bugs

* fix test_cuda_graph_static_mode_error aborted

* fix windows bugs

* fix mac-python3 error

* fix hip compile bugs

* change mode to static

* change to static mode

* fix ci bugs

* fix py3 bugs

* fix windows bugs

* fix bugs

* add static flag

* add PADDLE_API

* change position of PADDLE_API

* fix windows bugs

* change mode to dynamic lib

* fix windows static bugs

* deal with conflict

* fix windows unit bug

* fix coverage

* deal with conflict

* fix windows-inference

* fix py3 bugs

* fix bugs when compile type_info

* fix compile bugs

* fix py3 bugs

* fix windows bugs

* fix windows openblas

* fix xpu bugs

* fix enforce_test in windows

* update code according comment

* fix windows cmake bug

* fix windows bugs

* fix windows bugs

* delete cinn unittest

* fix cinn bugs

---------
Co-authored-by: lzydev <1528794076@qq.com>

da50a009

24 5月, 2023 1 次提交

Try to increase the repeat of autotune and fix the setting of allow_tf32_cublas. (#53622) · f4abe34b

由 Yiqun Liu 提交于 5月 24, 2023

* Try to increase the repeat of autotune and fix the setting of allow_tf32_cublas.

* Change the repeat of cublaslt to 10.

* Use FLAGS_cublaslt_exhaustive_search_times as repeats.

* Fix compiling error on CI.

* Polish the key and simplify codes.

f4abe34b

09 5月, 2023 1 次提交
- G
  remove some [-Wunused-parameter]warning (#53617) · bafc3469
  由 Galaxy1458 提交于 5月 09, 2023
```
* test,test=develop

* test,test=develop

* test,test=develop

* test,test=develop
```
  bafc3469
19 4月, 2023 1 次提交

Support Linear operation in cuBlaslt and plug into attn_gemm and fusedLinear backward op (#52028) · f6f18835

由 limingshu 提交于 4月 19, 2023

* first commit

* restruct c++ interface to divide linear from matmulwithcublaslt

* finish building in cublaslt impl

* fix code bugs

* fix host cost

* add some changes

f6f18835

13 4月, 2023 1 次提交
- U
  
  [cutlass] Sparse conv3d backward fusion (#52361) · 0b98d1aa
  由 umiswing 提交于 4月 13, 2023
  
  0b98d1aa
10 4月, 2023 1 次提交

[enforce.h Decouple gflags.h] Move gflags.h from enforce.h to enforce.cc (#52573) · 3c0b1795

由 HongyuJia 提交于 4月 10, 2023

* [enforce.h Decouple gflags.h] Move gflags.h from enforce.h to enforce.cc

* Add gflags.h for other files

* Add gflags.h for other files

* Add gflags.h for blas_impl.hip.h

* Add gflags.h for miopen_helper.h

3c0b1795

20 3月, 2023 1 次提交

Support Linear operation in cuBlaslt and plug into attn_gemm and fusedLinear forward op (#51124) · 2dfc3fa8

由 limingshu 提交于 3月 20, 2023

* optimization for fused linear op

* fix code format

* optimization for linear fused forward

* merge with develop

* fix bugs for gemm_ephilog

* package of cublaslt ephilogue type with enmu

* final fix before code reviewing

* fix missed fusedType typo

* fix code according to review suggestions

* fix windows ci error

* change location of MatmulPlanner

* add some changes for compiler error fix

---------

2dfc3fa8

15 3月, 2023 1 次提交
- U
  
  Auto tune for cutlass (#50809) · 12d43da9
  由 umiswing 提交于 3月 15, 2023
  
  12d43da9
02 3月, 2023 1 次提交

Cache for cublaslt descriptor (#50931) · 819f8939

由 limingshu 提交于 3月 02, 2023

* first commit

* finish base work

* modification for good

* fix for cache setting and gather the algo and desc as one data for cache storage

* fix for cache setting and gather the algo and desc as one data for cache storage

* install pre-commit check

819f8939

26 2月, 2023 1 次提交

Matmul performance optimization with cuBlasLt (#46431) · d4217fc6

由 limingshu 提交于 2月 26, 2023


* implement of matmul using cublasLt instead of cublas

* Update matmul_kernel_impl_via_blasLt.h

---------
Co-authored-by: Nzhangbopd <1299246947@qq.com>
Co-authored-by: NBo Zhang <105368690+zhangbopd@users.noreply.github.com>
Co-authored-by: NLiu Yiqun <liuyiqun01@baidu.com>

d4217fc6

21 2月, 2023 1 次提交
- L
  
  Addition of marco for auto_tune_base.h (#50516) · 27281e1f
  由 limingshu 提交于 2月 21, 2023
  
  27281e1f
25 1月, 2023 1 次提交
- L
  remove useless kTranspose enum element (#38660) · f43cb3b7
  由 limingshu 提交于 1月 25, 2023
```
Co-authored-by: Nzhangbopd <1299246947@qq.com>
```
  f43cb3b7
14 12月, 2022 1 次提交

Divide elementwise case from BroadcastKernel and refine transpose autotune (#33051) · 6c9df13d

由 limingshu 提交于 12月 14, 2022

* First Commit.

* add some codes

* add elementwise loader

* fix code styles

* merge with develop

* add some changes both in elementwise and transpose

* add init operation in broadcast kernel.

* change codes according to pr suggestions about transpose file

* fix error for op-benchmark ci

* fix according to ci

6c9df13d

24 11月, 2022 1 次提交
- P
  
  [PHI decoupling] remove "paddle/fluid/platform/enforce.h" in phi (#48049) · df23c7c3
  由 PuQing 提交于 11月 24, 2022
  
  df23c7c3
18 11月, 2022 1 次提交

CUDNN v8 Implementation of Convolution Kernels (#47454) · 14a6e67b

由 Tian Zheng 提交于 11月 18, 2022

* Refactor conv_kernel and conv_grad_kernel to provide interface for CUDNNv8 implementation

* Fix macro

* Add implementation for conv_kernel and conv_grad_kernel

* Modification after rebase onto latest develop

* Modify plan cache to comply with the API of phi::autotune

* Refactor to reduce duplicate code

* Review fix:
- move functions in  conv_kernel_impl_v8.h and conv_grad_kernel_impl_v8.h to conv_kernel.cu and conv_grad_kernelk.cu
- add const specifier for input tensor
- add logging when plans fail to execute
- move CudnnConvBwdFilterV8 and CudnnConvBwdDataV8 to conv_cudnn_frontend.h

* - move plan building outside of cache

* Fix ROCM build

14a6e67b

11 11月, 2022 1 次提交
- Y
  
  Simplify the autotune cache codes. (#47667) · 8758a338
  由 Yiqun Liu 提交于 11月 11, 2022
  
  8758a338
10 11月, 2022 1 次提交

[PHI Decoupling] remove dependency on "paddle/fluid/platform/errors.h" and... · 4c375454

由 huangjiyi 提交于 11月 10, 2022

[PHI Decoupling] remove dependency on "paddle/fluid/platform/errors.h" and "paddle/fluid/platform/fast_divmod.h" in phi. (#47815)

* rm "paddle/fluid/platform/errors.h" in phi

* rm "paddle/fluid/platform/fast_divmod.h" in phi

4c375454

08 11月, 2022 1 次提交
- C
  
  normalize autotune tests dir (#47726) · 6bab3343
  由 Chen Weihang 提交于 11月 08, 2022
  
  6bab3343
01 11月, 2022 1 次提交

Fix bugs in tranpose kernel (#47212) · ec7fe888

由 limingshu 提交于 11月 01, 2022

* first commit

* transpose_kernel_optimization

* first complishment of transpose op

* second commit

* refine code logics of tranpose_kernel

* refine transpose kernel

* first commit

* fix DtoD copy bugs for hip

* refine code according to the PR advice

* change dim to int64_t type.

* fix some type error

ec7fe888

19 10月, 2022 1 次提交
- Y
  Enable to record whether the conv algo is got by exhaustive search to fix... · 3bc4b850
  由 Yiqun Liu 提交于 10月 19, 2022
```
Enable to record whether the conv algo is got by exhaustive search to fix autotune cache bug. (#47065)
```
  3bc4b850
28 9月, 2022 1 次提交
- L
  
  first commit (#46525) · 806b252c
  由 limingshu 提交于 9月 28, 2022
  
  806b252c
22 9月, 2022 1 次提交
- L
  [Code Clean] Clarify once_flag setting for kernel autotune module (#44141) · 66a4b2e8
  由 limingshu 提交于 9月 22, 2022
```
* first commit

* clarify the quotes

* change code style format

* rerun for ci
```
  66a4b2e8
14 9月, 2022 1 次提交
- Y
  
  Simplify the codes of conv. (#45966) · 3a5b5048
  由 Yiqun Liu 提交于 9月 14, 2022
  
  3a5b5048
25 8月, 2022 1 次提交

optimize conv algo cache (#41891) · 1cd7e68b

由 hong 提交于 8月 25, 2022

* optimizer conv alog speed

* code polish

* remove useless code

* fix compile error

* fix cpu compile error

* not use cudnn alog t

* add search cache max number

* polish code

* fix cache test bug

* add groups data format to conv args

* fix cache test bug

* fix cudnn_deterministic bug

* fix test switch auto tune bug

* fix test swith autotune bug;

* fix conv cache bug

* fix cache test error

* fix cache test bug

* fix windows mac compile error

* fix workspace search error

* update cudnn cache

* fix cache test bug; test=develop

* fix autotune swith test error

* polish code

* oplish code

1cd7e68b

15 7月, 2022 1 次提交
- R
  
  Remove boost library (#44092) · d2e59e15
  由 Ruibiao Chen 提交于 7月 15, 2022
  
  d2e59e15
01 7月, 2022 1 次提交

Addition of switch_auto_tune option for transpose op (#43310) · 53d5abe3

由 limingshu 提交于 7月 01, 2022

* 2nd part of transpose update

* add switch_auto_tune option.

* add some changes according to Ci

* refine the structure of auto_tune_base.

* merge develop changes

* reset the switch_set_range and change unittest of transpose auto-tune

* change the kernel auto-tune logits

53d5abe3

24 6月, 2022 1 次提交

[Phi]Change Copy from Kernel to basic component utils (#43622) · 2739bd73

由 YuanRisheng 提交于 6月 24, 2022

* perfect copy

* deal with conflict

* deal with conflict

* fix compile bugs

* fix unittest bugs

* change code format

* deal with conflict

* modify code by review

* fix ce bugs

* fix ce bugs

* add lo

* perfect code format

* deal with conflicts

2739bd73

07 6月, 2022 1 次提交
- L
  Transpose optimization with assitant of Chengdu Supercomputing Center and... · 71a63f0a
  由 limingshu 提交于 6月 07, 2022
```
Transpose optimization with assitant of  Chengdu Supercomputing Center and auto_tune operation (#42704)
```
  71a63f0a
05 6月, 2022 1 次提交
- S
  
  【code format check upgrade】 step2：clang-format (#42840) · a3730dc8
  由 Sing_chan 提交于 6月 05, 2022
  
  a3730dc8
04 6月, 2022 1 次提交
- S
  
  【code format check upgrade】 step2：cmake-format (#43057) · 92568edb
  由 Sing_chan 提交于 6月 04, 2022
  
  92568edb
15 4月, 2022 1 次提交

Change cuDNN Conv kernel for auto tune feature (#41313) · 35acfeda

由 limingshu 提交于 4月 15, 2022

* change cudnn helper for auto-tune

* Add FLAGS_use_autotune to set the global status of autotune and change the order of choosing algorithm.

* Fix the bug in calculating and printing current step cache hit rate.

* Improve the autotune cache and fix unittest.

* Change the key from AlgorithmType to int64_t.

* Fix unittest for cpu-only env.

* change ChooseAlgoByWorkspace for heuristic mode
Co-authored-by: NLiu Yiqun <liuyiqun01@baidu.com>

35acfeda

09 4月, 2022 1 次提交

Autotune the workspace_size_limit in conv. (#40338) · b937cdc5

由 limingshu 提交于 4月 09, 2022

* Using the maximum workspace_size of all alogirhms to limit the workspace size in exhaustive search mode.

* Use the system cudaMalloc and cudaFree to allocate workspace during searching.

* Enable switch of two kind of workspace setting methods.
Co-authored-by: NLiu Yiqun <liuyiqun01@baidu.com>

b937cdc5

06 4月, 2022 1 次提交
- S
  
  fix bug of missing boost when compile cache.cc (#41430) · 5c6e4bff
  由 Sing_chan 提交于 4月 06, 2022
  
  5c6e4bff
05 4月, 2022 1 次提交

Implement AutoTuneStatus class for Kernel Auto Tune (#41218) · b0f8000e

由 Zhang Ting 提交于 4月 05, 2022

* switch autotune

* implement AutoTuneCache

* implement AutoTuneCache class

* add pybind api

* add dygraph test

* support static mode and eager mode and improve unittests

* rename the SwitchAutoTune Class and improve tests

* improve AutoTuneStatus and reduce the cost of tests

b0f8000e

31 3月, 2022 2 次提交

Z

Implement AutotuneCache class for Kernel AutoTune (#41169) · 7dfd3846
由 Zhang Ting 提交于 3月 31, 2022

7dfd3846

add_autotune_kernel_tool (#40658) · 7c5dca9f

由 limingshu 提交于 3月 31, 2022

* for 1st time interface combine.

* modification with kernel factory

* first auto_tune version.

* first version.

* basic version

* add warm up step.

* a debug version.

* optimize the functionality of class auto_tuner.

* add some quotes for optimized auto_tuner class.

* add some quotes for optimized auto_tuner class.

* add namespace.

* modification according to the advices

* replace fluid header with phi header.

* replace fluid header with phi header.

7c5dca9f

25 3月, 2022 1 次提交
- Z
  
  Implement a common AlgorithmsCache for kernel auto-tune (#40793) · 01b688c0
  由 Zhang Ting 提交于 3月 25, 2022
  
  01b688c0
23 3月, 2022 1 次提交

Add Gpu Timer Tool (#40642) · 291d8941

由 Zhang Ting 提交于 3月 23, 2022

* add kernel profiler

* add gpu timer tool

* remove warmup

* fix rocm complilation error

291d8941

PaddlePaddle / Paddle 大约 1 年 前同步成功

PaddlePaddle / Paddle
大约 1 年前同步成功