提交 · 2944d3c01d760f7c8fb2e8072fc98c0ca34c3fa0 · PaddlePaddle / Paddle

13 4月, 2023 1 次提交
- U
  
  [cutlass] Sparse conv3d backward fusion (#52361) · 0b98d1aa
  由 umiswing 提交于 4月 13, 2023
  
  0b98d1aa
10 4月, 2023 1 次提交

[enforce.h Decouple gflags.h] Move gflags.h from enforce.h to enforce.cc (#52573) · 3c0b1795

由 HongyuJia 提交于 4月 10, 2023

* [enforce.h Decouple gflags.h] Move gflags.h from enforce.h to enforce.cc

* Add gflags.h for other files

* Add gflags.h for other files

* Add gflags.h for blas_impl.hip.h

* Add gflags.h for miopen_helper.h

3c0b1795

20 3月, 2023 1 次提交

Support Linear operation in cuBlaslt and plug into attn_gemm and fusedLinear forward op (#51124) · 2dfc3fa8

由 limingshu 提交于 3月 20, 2023

* optimization for fused linear op

* fix code format

* optimization for linear fused forward

* merge with develop

* fix bugs for gemm_ephilog

* package of cublaslt ephilogue type with enmu

* final fix before code reviewing

* fix missed fusedType typo

* fix code according to review suggestions

* fix windows ci error

* change location of MatmulPlanner

* add some changes for compiler error fix

---------

2dfc3fa8

15 3月, 2023 1 次提交
- U
  
  Auto tune for cutlass (#50809) · 12d43da9
  由 umiswing 提交于 3月 15, 2023
  
  12d43da9
02 3月, 2023 1 次提交

Cache for cublaslt descriptor (#50931) · 819f8939

由 limingshu 提交于 3月 02, 2023

* first commit

* finish base work

* modification for good

* fix for cache setting and gather the algo and desc as one data for cache storage

* fix for cache setting and gather the algo and desc as one data for cache storage

* install pre-commit check

819f8939

26 2月, 2023 1 次提交

Matmul performance optimization with cuBlasLt (#46431) · d4217fc6

由 limingshu 提交于 2月 26, 2023


* implement of matmul using cublasLt instead of cublas

* Update matmul_kernel_impl_via_blasLt.h

---------
Co-authored-by: Nzhangbopd <1299246947@qq.com>
Co-authored-by: NBo Zhang <105368690+zhangbopd@users.noreply.github.com>
Co-authored-by: NLiu Yiqun <liuyiqun01@baidu.com>

d4217fc6

21 2月, 2023 1 次提交
- L
  
  Addition of marco for auto_tune_base.h (#50516) · 27281e1f
  由 limingshu 提交于 2月 21, 2023
  
  27281e1f
25 1月, 2023 1 次提交
- L
  remove useless kTranspose enum element (#38660) · f43cb3b7
  由 limingshu 提交于 1月 25, 2023
```
Co-authored-by: Nzhangbopd <1299246947@qq.com>
```
  f43cb3b7
14 12月, 2022 1 次提交

Divide elementwise case from BroadcastKernel and refine transpose autotune (#33051) · 6c9df13d

由 limingshu 提交于 12月 14, 2022

* First Commit.

* add some codes

* add elementwise loader

* fix code styles

* merge with develop

* add some changes both in elementwise and transpose

* add init operation in broadcast kernel.

* change codes according to pr suggestions about transpose file

* fix error for op-benchmark ci

* fix according to ci

6c9df13d

24 11月, 2022 1 次提交
- P
  
  [PHI decoupling] remove "paddle/fluid/platform/enforce.h" in phi (#48049) · df23c7c3
  由 PuQing 提交于 11月 24, 2022
  
  df23c7c3
18 11月, 2022 1 次提交

CUDNN v8 Implementation of Convolution Kernels (#47454) · 14a6e67b

由 Tian Zheng 提交于 11月 18, 2022

* Refactor conv_kernel and conv_grad_kernel to provide interface for CUDNNv8 implementation

* Fix macro

* Add implementation for conv_kernel and conv_grad_kernel

* Modification after rebase onto latest develop

* Modify plan cache to comply with the API of phi::autotune

* Refactor to reduce duplicate code

* Review fix:
- move functions in  conv_kernel_impl_v8.h and conv_grad_kernel_impl_v8.h to conv_kernel.cu and conv_grad_kernelk.cu
- add const specifier for input tensor
- add logging when plans fail to execute
- move CudnnConvBwdFilterV8 and CudnnConvBwdDataV8 to conv_cudnn_frontend.h

* - move plan building outside of cache

* Fix ROCM build

14a6e67b

11 11月, 2022 1 次提交
- Y
  
  Simplify the autotune cache codes. (#47667) · 8758a338
  由 Yiqun Liu 提交于 11月 11, 2022
  
  8758a338
10 11月, 2022 1 次提交

[PHI Decoupling] remove dependency on "paddle/fluid/platform/errors.h" and... · 4c375454

由 huangjiyi 提交于 11月 10, 2022

[PHI Decoupling] remove dependency on "paddle/fluid/platform/errors.h" and "paddle/fluid/platform/fast_divmod.h" in phi. (#47815)

* rm "paddle/fluid/platform/errors.h" in phi

* rm "paddle/fluid/platform/fast_divmod.h" in phi

4c375454

08 11月, 2022 1 次提交
- C
  
  normalize autotune tests dir (#47726) · 6bab3343
  由 Chen Weihang 提交于 11月 08, 2022
  
  6bab3343
01 11月, 2022 1 次提交

Fix bugs in tranpose kernel (#47212) · ec7fe888

由 limingshu 提交于 11月 01, 2022

* first commit

* transpose_kernel_optimization

* first complishment of transpose op

* second commit

* refine code logics of tranpose_kernel

* refine transpose kernel

* first commit

* fix DtoD copy bugs for hip

* refine code according to the PR advice

* change dim to int64_t type.

* fix some type error

ec7fe888

19 10月, 2022 1 次提交
- Y
  Enable to record whether the conv algo is got by exhaustive search to fix... · 3bc4b850
  由 Yiqun Liu 提交于 10月 19, 2022
```
Enable to record whether the conv algo is got by exhaustive search to fix autotune cache bug. (#47065)
```
  3bc4b850
28 9月, 2022 1 次提交
- L
  
  first commit (#46525) · 806b252c
  由 limingshu 提交于 9月 28, 2022
  
  806b252c
22 9月, 2022 1 次提交
- L
  [Code Clean] Clarify once_flag setting for kernel autotune module (#44141) · 66a4b2e8
  由 limingshu 提交于 9月 22, 2022
```
* first commit

* clarify the quotes

* change code style format

* rerun for ci
```
  66a4b2e8
14 9月, 2022 1 次提交
- Y
  
  Simplify the codes of conv. (#45966) · 3a5b5048
  由 Yiqun Liu 提交于 9月 14, 2022
  
  3a5b5048
25 8月, 2022 1 次提交

optimize conv algo cache (#41891) · 1cd7e68b

由 hong 提交于 8月 25, 2022

* optimizer conv alog speed

* code polish

* remove useless code

* fix compile error

* fix cpu compile error

* not use cudnn alog t

* add search cache max number

* polish code

* fix cache test bug

* add groups data format to conv args

* fix cache test bug

* fix cudnn_deterministic bug

* fix test switch auto tune bug

* fix test swith autotune bug;

* fix conv cache bug

* fix cache test error

* fix cache test bug

* fix windows mac compile error

* fix workspace search error

* update cudnn cache

* fix cache test bug; test=develop

* fix autotune swith test error

* polish code

* oplish code

1cd7e68b

15 7月, 2022 1 次提交
- R
  
  Remove boost library (#44092) · d2e59e15
  由 Ruibiao Chen 提交于 7月 15, 2022
  
  d2e59e15
01 7月, 2022 1 次提交

Addition of switch_auto_tune option for transpose op (#43310) · 53d5abe3

由 limingshu 提交于 7月 01, 2022

* 2nd part of transpose update

* add switch_auto_tune option.

* add some changes according to Ci

* refine the structure of auto_tune_base.

* merge develop changes

* reset the switch_set_range and change unittest of transpose auto-tune

* change the kernel auto-tune logits

53d5abe3

24 6月, 2022 1 次提交

[Phi]Change Copy from Kernel to basic component utils (#43622) · 2739bd73

由 YuanRisheng 提交于 6月 24, 2022

* perfect copy

* deal with conflict

* deal with conflict

* fix compile bugs

* fix unittest bugs

* change code format

* deal with conflict

* modify code by review

* fix ce bugs

* fix ce bugs

* add lo

* perfect code format

* deal with conflicts

2739bd73

07 6月, 2022 1 次提交
- L
  Transpose optimization with assitant of Chengdu Supercomputing Center and... · 71a63f0a
  由 limingshu 提交于 6月 07, 2022
```
Transpose optimization with assitant of  Chengdu Supercomputing Center and auto_tune operation (#42704)
```
  71a63f0a
05 6月, 2022 1 次提交
- S
  
  【code format check upgrade】 step2：clang-format (#42840) · a3730dc8
  由 Sing_chan 提交于 6月 05, 2022
  
  a3730dc8
04 6月, 2022 1 次提交
- S
  
  【code format check upgrade】 step2：cmake-format (#43057) · 92568edb
  由 Sing_chan 提交于 6月 04, 2022
  
  92568edb
15 4月, 2022 1 次提交

Change cuDNN Conv kernel for auto tune feature (#41313) · 35acfeda

由 limingshu 提交于 4月 15, 2022

* change cudnn helper for auto-tune

* Add FLAGS_use_autotune to set the global status of autotune and change the order of choosing algorithm.

* Fix the bug in calculating and printing current step cache hit rate.

* Improve the autotune cache and fix unittest.

* Change the key from AlgorithmType to int64_t.

* Fix unittest for cpu-only env.

* change ChooseAlgoByWorkspace for heuristic mode
Co-authored-by: NLiu Yiqun <liuyiqun01@baidu.com>

35acfeda

09 4月, 2022 1 次提交

Autotune the workspace_size_limit in conv. (#40338) · b937cdc5

由 limingshu 提交于 4月 09, 2022

* Using the maximum workspace_size of all alogirhms to limit the workspace size in exhaustive search mode.

* Use the system cudaMalloc and cudaFree to allocate workspace during searching.

* Enable switch of two kind of workspace setting methods.
Co-authored-by: NLiu Yiqun <liuyiqun01@baidu.com>

b937cdc5

06 4月, 2022 1 次提交
- S
  
  fix bug of missing boost when compile cache.cc (#41430) · 5c6e4bff
  由 Sing_chan 提交于 4月 06, 2022
  
  5c6e4bff
05 4月, 2022 1 次提交

Implement AutoTuneStatus class for Kernel Auto Tune (#41218) · b0f8000e

由 Zhang Ting 提交于 4月 05, 2022

* switch autotune

* implement AutoTuneCache

* implement AutoTuneCache class

* add pybind api

* add dygraph test

* support static mode and eager mode and improve unittests

* rename the SwitchAutoTune Class and improve tests

* improve AutoTuneStatus and reduce the cost of tests

b0f8000e

31 3月, 2022 2 次提交

Z

Implement AutotuneCache class for Kernel AutoTune (#41169) · 7dfd3846
由 Zhang Ting 提交于 3月 31, 2022

7dfd3846

add_autotune_kernel_tool (#40658) · 7c5dca9f

由 limingshu 提交于 3月 31, 2022

* for 1st time interface combine.

* modification with kernel factory

* first auto_tune version.

* first version.

* basic version

* add warm up step.

* a debug version.

* optimize the functionality of class auto_tuner.

* add some quotes for optimized auto_tuner class.

* add some quotes for optimized auto_tuner class.

* add namespace.

* modification according to the advices

* replace fluid header with phi header.

* replace fluid header with phi header.

7c5dca9f

25 3月, 2022 1 次提交
- Z
  
  Implement a common AlgorithmsCache for kernel auto-tune (#40793) · 01b688c0
  由 Zhang Ting 提交于 3月 25, 2022
  
  01b688c0
23 3月, 2022 1 次提交

Add Gpu Timer Tool (#40642) · 291d8941

由 Zhang Ting 提交于 3月 23, 2022

* add kernel profiler

* add gpu timer tool

* remove warmup

* fix rocm complilation error

291d8941

PaddlePaddle / Paddle 大约 1 年 前同步成功

PaddlePaddle / Paddle
大约 1 年前同步成功