提交 · a4d9851b89d8c4f3c33d556dd93ded40c254eb3a · PaddlePaddle / Paddle

08 12月, 2022 2 次提交

[PHI decoupling] move cuda_graph from fluid to phi (#48686) · a4d9851b

由 huangjiyi 提交于 12月 08, 2022

* move cuda_graph from fluid to phi

* move device_memory_aligment from fluid to phi

* Revert "move device_memory_aligment from fluid to phi"

This reverts commit b92fcd39a0a50fdac13278f49be0237a85f3a13f.

* update xpu cmake

a4d9851b

N

remove gpu_info.h from phi dependencies (#48811) · 73688894
由 Netpunk 提交于 12月 08, 2022

73688894

28 11月, 2022 1 次提交

[PHI decoupling] move several header files from fluid to phi (#48415) · fd9c91c3

由 huangjiyi 提交于 11月 28, 2022

* decouple cudnn_desc.h from fluid

* move cudnn_desc.h from fluid to phi

* fix bugs

* decouple cudnn_helper.h from fluid

* fix bugs

* move cudnn_helper.h from fluid to phi

* add fluid cudnn_helper.h

* move miopen_desc.h from fluid to phi

* move miopen_helper.h from fluid to phi

* fix bugs

* move gpu_dnn.h from fluid to phi

* fix bugs

* update copyright year

* simplify gpu_dnn.h in fluid

* fix bugs

* fix xpu build bug

* fix compile bug

* fix bug

fd9c91c3

18 11月, 2022 1 次提交

CUDNN v8 Implementation of Convolution Kernels (#47454) · 14a6e67b

由 Tian Zheng 提交于 11月 18, 2022

* Refactor conv_kernel and conv_grad_kernel to provide interface for CUDNNv8 implementation

* Fix macro

* Add implementation for conv_kernel and conv_grad_kernel

* Modification after rebase onto latest develop

* Modify plan cache to comply with the API of phi::autotune

* Refactor to reduce duplicate code

* Review fix:
- move functions in  conv_kernel_impl_v8.h and conv_grad_kernel_impl_v8.h to conv_kernel.cu and conv_grad_kernelk.cu
- add const specifier for input tensor
- add logging when plans fail to execute
- move CudnnConvBwdFilterV8 and CudnnConvBwdDataV8 to conv_cudnn_frontend.h

* - move plan building outside of cache

* Fix ROCM build

14a6e67b

07 11月, 2022 1 次提交
- Y
  Define ConvRunner to wrapper the call of cudnn conv functions. (#47576) · c331e2ce
  由 Yiqun Liu 提交于 11月 07, 2022
```
* Define ConvRunner to wrapper the call of cudnn conv functions.

* Use ConvKind in SearchAlgorithm.
```
  c331e2ce
24 10月, 2022 2 次提交
- Y
  
  Enhance the implementation of some conv functions. (#47281) · bc47e7ac
  由 Yiqun Liu 提交于 10月 24, 2022
  
  bc47e7ac
- Y
  
  Move the header file of conv cudnn and miopen to phi directory. (#47248) · 31f57f29
  由 Yiqun Liu 提交于 10月 24, 2022
  
  31f57f29
19 10月, 2022 1 次提交
- Y
  Enable to record whether the conv algo is got by exhaustive search to fix... · 3bc4b850
  由 Yiqun Liu 提交于 10月 19, 2022
```
Enable to record whether the conv algo is got by exhaustive search to fix autotune cache bug. (#47065)
```
  3bc4b850
28 9月, 2022 1 次提交

Remove the declaration of using Tensor in framework/tensor.h (#46432) · e12a905e

由 Chen Weihang 提交于 9月 28, 2022

* remove needless using tensor

* remove needless using tensor

* resolve conflict

* replace tensor using

* fix format error

* revert needless changing

* fix rocm and npu compile error

* fix cinn compile error

* fix format error

* fix mkldnn format error

* fix mkldnn format error

* fix cinn compile error

* fix cinn compile error

* fix cinn compile error

* resolve conflict

e12a905e

19 9月, 2022 1 次提交

Fix wrong eigen header include (#46082) · 59a2a987

由 zyfncg 提交于 9月 19, 2022

* fix wrong eigen header include

* fix complie bug

* fix nan_inf_utils_detail

* fix resource_manager

* fix conv_miopen_helper

59a2a987

14 9月, 2022 1 次提交
- Y
  
  Simplify the codes of conv. (#45966) · 3a5b5048
  由 Yiqun Liu 提交于 9月 14, 2022
  
  3a5b5048
25 8月, 2022 1 次提交

optimize conv algo cache (#41891) · 1cd7e68b

由 hong 提交于 8月 25, 2022

* optimizer conv alog speed

* code polish

* remove useless code

* fix compile error

* fix cpu compile error

* not use cudnn alog t

* add search cache max number

* polish code

* fix cache test bug

* add groups data format to conv args

* fix cache test bug

* fix cudnn_deterministic bug

* fix test switch auto tune bug

* fix test swith autotune bug;

* fix conv cache bug

* fix cache test error

* fix cache test bug

* fix windows mac compile error

* fix workspace search error

* update cudnn cache

* fix cache test bug; test=develop

* fix autotune swith test error

* polish code

* oplish code

1cd7e68b

26 6月, 2022 1 次提交
- S
  
  format all files in fluid using new config (#43776) · 576236a0
  由 Sing_chan 提交于 6月 26, 2022
  
  576236a0
27 5月, 2022 1 次提交
- R
  Support memory stats for CPU (#42945) · 21f11d35
  由 Ruibiao Chen 提交于 5月 27, 2022
```
* Support memory stats for CPU

* Add UTs

* Fix typos

* Fix typos
```
  21f11d35
15 4月, 2022 1 次提交

Change cuDNN Conv kernel for auto tune feature (#41313) · 35acfeda

由 limingshu 提交于 4月 15, 2022

* change cudnn helper for auto-tune

* Add FLAGS_use_autotune to set the global status of autotune and change the order of choosing algorithm.

* Fix the bug in calculating and printing current step cache hit rate.

* Improve the autotune cache and fix unittest.

* Change the key from AlgorithmType to int64_t.

* Fix unittest for cpu-only env.

* change ChooseAlgoByWorkspace for heuristic mode
Co-authored-by: NLiu Yiqun <liuyiqun01@baidu.com>

35acfeda

14 4月, 2022 1 次提交
- Y
  
  Optimize the finding of max workspace size. (#41741) · 3ce879db
  由 Yiqun Liu 提交于 4月 14, 2022
  
  3ce879db
09 4月, 2022 1 次提交

Autotune the workspace_size_limit in conv. (#40338) · b937cdc5

由 limingshu 提交于 4月 09, 2022

* Using the maximum workspace_size of all alogirhms to limit the workspace size in exhaustive search mode.

* Use the system cudaMalloc and cudaFree to allocate workspace during searching.

* Enable switch of two kind of workspace setting methods.
Co-authored-by: NLiu Yiqun <liuyiqun01@baidu.com>

b937cdc5

04 3月, 2022 1 次提交

Move conv to pten (#39354) · d50fb43e

由 hong 提交于 3月 04, 2022

* move conv to pten

* move conv to pten; test=develop

* fix bug;

* add conv cudnn impl; test=develop

* update

* update operator; test=develop

* fix bug; test=develop

* move operator and prepared_operator to develop; test=develop

* resolve conflict; test=develop

* remove useless code;test=develop

* add depency ; test=develop

* fix bug;

* add sig.cc ; test=develop

* fix use_op error; test=develop

* fix bug; test=develop

* fix bug; test=develop

* add conv3d register; test=develop

* fix star gan and conv_nn_grad test failed; test=develop

* add header; test=develop

* manul to recover to develop;

* resolve confilct; test=develop

* remove useless code

* fix bug;

* remove conv2d_cudnn; test=develop

* fix bugs; test=develop

* fix cpu rocm compile bugs; test=develop

* fix blas error; test=develop

* fix compile bug; test=develop

* fix windows compile error; test=develop

* fix windows error; test=develop

* resolve confilct; test=develop

d50fb43e

20 2月, 2022 1 次提交

[PTen->Phi PR1] Change pten dirname and namespace to phi (#39748) · dcfe1986

由 Chen Weihang 提交于 2月 20, 2022

* rename pten dir to phi

* rename namespace to phi

* rename infrt pten dir to phi

* resolve conflict

* rename pten to phi in cmake

* revert all infrt change

* change needed files

* fix infrt failed

* fix inference failed

dcfe1986

19 2月, 2022 1 次提交

[Pten]Unify paddle/pten::framework::ddim into pten::ddim (#39614) · 2fe04264

由 Aurelius84 提交于 2月 19, 2022

* Unify paddle/pten::framework::ddim into pten::ddim

* fix paddle namespace

* compile sucessfully

* fix npu src file

* fix conflict

* fix conflict

* fix tensorrt compiler error

* fix conflict

* fix conflict

* fix tesst file conflict

* fix conflict

* fix mlu file conflict

* fix mlu file conflict

* fix cinn header file conflict

* fix conflict

* fix conflict

* fix conflict

* fix conflict

2fe04264

08 2月, 2022 1 次提交
- W
  [PTEN] Update gpu_context. (#39359) · 24103cbb
  由 Wilber 提交于 2月 08, 2022
```
* gpu_context..

* update

* update

* update
```
  24103cbb
25 1月, 2022 1 次提交
- L
  GetWorkspaceSize trigger modfication in heuristic cudnn conv (#39184) · 4c61e141
  由 limingshu 提交于 1月 25, 2022
```
* first commit

* add more changes
```
  4c61e141
30 12月, 2021 1 次提交
- L
  
  first commit (#38590) · ebc72ac2
  由 limingshu 提交于 12月 30, 2021
  
  ebc72ac2
03 12月, 2021 1 次提交
- R
  refine structure for cuda and rocm (#37202) · a6d2fddb
  由 ronnywang 提交于 12月 03, 2021
```
* refine structure for cuda and rocm

* update

* update

* update

* update
```
  a6d2fddb
08 10月, 2021 1 次提交

Support CUDA Graph on ParallelExecutor (#36250) · f9591bb1

由 Zeng Jinle 提交于 10月 08, 2021

* support CUDA Graph on PE

* add ut, fix CI compile

* reduce memory consumption

* fix CUDA 10 CI

* improve coverage

* improve python coverage

f9591bb1

04 8月, 2021 1 次提交
- L
  
  Set Tensor Core MathType for bfloat16 in conv using cudnn (#34409) · c79fa1c3
  由 Lijunhui 提交于 8月 04, 2021
  
  c79fa1c3
02 6月, 2021 1 次提交
- W
  
  conv2d support bfloat16 (#32221) · 5981bee2
  由 wuhuanzhou 提交于 6月 02, 2021
  
  5981bee2
26 5月, 2021 1 次提交

optimize OP's compilation time (#32617) · 78ecb668

由 wuhuanzhou 提交于 5月 26, 2021

* optimize OP's compilation time, test=develop

* add more op and run ci test, test=develop

* CUDA Kernel register in cc file, test=develop

* fix macros, test=develop

* fix undefined symbol error, test=develop

* fix compilation error and undefined symbol, test=develop

* fix compilation error on Windows, test=develop

* fix compilation error on Windows, test=develop

78ecb668

18 2月, 2021 1 次提交
- Z
  enable exhaustive_search for forward and backward algos when dtype is float16 (#30959) · f0ee1592
  由 Zhang Ting 提交于 2月 18, 2021
```
* enable exhaustive_search for input_grad when dtype is float16

* enable exhaustive_search for forward algos
```
  f0ee1592
11 1月, 2021 1 次提交
- A
  
  Add tf32 switch for cuDNN (#29192) · 924aac22
  由 AshburnLee 提交于 1月 11, 2021
  
  924aac22
20 11月, 2020 1 次提交
- W
  
  fix the number of perf algo for conv cudnn in exhaustive mode (#28694) · 8b853b30
  由 wangchaochaohu 提交于 11月 20, 2020
  
  8b853b30
16 11月, 2020 1 次提交
- L
  
  Fix cudnn workspace limit in cudnn-8 (#28611) · f962bd34
  由 Leo Chen 提交于 11月 16, 2020
  
  f962bd34
14 10月, 2020 1 次提交
- Z
  tune backward filter algorithm for float16 (#27529) · d5cc144c
  由 Zhang Ting 提交于 10月 14, 2020
```
* use exhaustive_search for float16

* tune algo only when dtype is float16
```
  d5cc144c
23 9月, 2020 1 次提交
- S
  [bug fix]:Memory increases after adapting the cudnn version to cudnn8 (#27436) · c17f9cf2
  由 Shang Zhizhou 提交于 9月 23, 2020
```
* [bug fix]:Memory increases after adapting the cudnn version to 8

* [bug fix]cudnnGetConvolutionForwardAlgorithm not defined
```
  c17f9cf2
05 8月, 2020 1 次提交
- Z
  [CUDNN8 support] : support CUDNN8 (#25664) · 358bc06c
  由 Zhaolong Xing 提交于 8月 05, 2020
```
* cunn8 support
test=develop

* fix ci error
test=develop
```
  358bc06c
27 5月, 2020 1 次提交
- W
  
  fix conv_transpose Op fp16 error test=develop (#24695) · 355caee1
  由 wangchaochaohu 提交于 5月 27, 2020
  
  355caee1
12 4月, 2020 1 次提交
- Z
  
  fix bug for exhaustive_search in conv_fusion_op, test=develop (#23727) · b4b6763a
  由 zhongpu 提交于 4月 12, 2020
  
  b4b6763a
03 4月, 2020 1 次提交

support Exhaustive search in dygraph (#23415) · dbfbd7ea

由 zhongpu 提交于 4月 03, 2020

* use global conv cache; test=develop

* use singleton cache; test=develop

* fix format error; test=develop

* add cudnn helper header; test=develop

* fix header error; test=develop

* fix mac unitest; test=develop

* fix mac unitest; test=develop

* fix file format; test=develop

* fix include file error, test=develop

* remove kernel_configs_ in class ExecutionContext and kernel_configs_map_ in class OperatorWithKernel, test=develop

* fix test_elementwise_mul_op_dim, test=develop

* fix compile error, test=develop
Co-authored-by: Nphlrain <phliuhongyu@126.com>

dbfbd7ea

02 4月, 2020 2 次提交

Z
Revert "Exhaustive search (#22821)", test=develop (#23401) · bfb07aaf
由 zhongpu 提交于 4月 02, 2020
```
This reverts commit 48144e40.
```
bfb07aaf

Exhaustive search (#22821) · 48144e40

由 zhongpu 提交于 4月 02, 2020

* use global conv cache; test=develop

* use singleton cache; test=develop

* fix format error; test=develop

* add cudnn helper header; test=develop

* fix header error; test=develop

* fix mac unitest; test=develop

* fix mac unitest; test=develop

* fix file format; test=develop

* fix include file error, test=develop

* remove kernel_configs_ in class ExecutionContext and kernel_configs_map_ in class OperatorWithKernel, test=develop

* fix test_elementwise_mul_op_dim, test=develop
Co-authored-by: Nphlrain <phliuhongyu@126.com>

48144e40

PaddlePaddle / Paddle 大约 1 年 前同步成功

PaddlePaddle / Paddle
大约 1 年前同步成功