提交 · f2c96bc264854a3176890c51187f94ddad3ee44b · PaddlePaddle / Paddle

29 3月, 2023 1 次提交
- S
  Fix generate_kernels.py in CUDA 12.0 (#52232) · f2c96bc2
  由 sneaxiy 提交于 3月 29, 2023
```
* fix generate_kernels.py in CUDA 12.0

* fix attrs bug
```
  f2c96bc2
24 3月, 2023 1 次提交

Memory Efficient Attention (#51867) · e5ad3859

由 ZhangDY-6483 提交于 3月 24, 2023

* first version, notest

* return final rst, notest

* use infinity() instead of max

* ut structure

* start up of ut

* generate lse

* update

* add depense

* reconstruct cmake

* move file

* add memory efficient attention and fix blasimpl

* update

* update cmake

* add namespace

* update cmake

* use .cu

* update for pad3d

* bug fix

* bug fix

* update

* bug fix

* update enforce

* add test case

* merge the lse pad

* fix kernel_fn of backward

* fix PADDLE_ENFORCE_EQ and phi_api

* fix PADDLE_ENFORCE

* fix PADDLE_ENFORCE

* rerun coverage

* fix memory efficient attention test

* rerun ci

* add cuda version condition

* add cuda version condition

* delete WIP test

* replace PADDLE_ENFORCE

* edit the namespace of datatype in multiple.cc

* rerun

* rerun

---------
Co-authored-by: Nliuyuang <liuyuang@baidu.com>

e5ad3859

08 3月, 2023 1 次提交
- P
  
  delete sm60 arch for cuda11 and delete sm35 arch for cuda10 (#50316) · 15689eac
  由 pangyoki 提交于 3月 08, 2023
  
  15689eac
10 1月, 2023 1 次提交
- Add cuda compiled arch check (#49592) · c0d6ec63
  由 MarDino 提交于 1月 10, 2023
  
  c0d6ec63
27 12月, 2022 1 次提交
- Y
  
  update jetson ampere sm (#49363) · 941811b2
  由 Yuanle Liu 提交于 12月 27, 2022
  
  941811b2
08 11月, 2022 1 次提交
- C
  
  Support cuda 11 with jetson (#47741) · 14c95700
  由 chalsliu 提交于 11月 08, 2022
  
  14c95700
14 6月, 2022 1 次提交
- W
  fix cmake-lint problems. (#43406) · 59f89236
  由 Wilber 提交于 6月 14, 2022
```
* cmake-lint

* update
```
  59f89236
04 6月, 2022 1 次提交
- S
  
  【code format check upgrade】 step2：cmake-format (#43057) · 92568edb
  由 Sing_chan 提交于 6月 04, 2022
  
  92568edb
21 4月, 2022 1 次提交
- J
  update ampere sm (#42023) · c3b0b680
  由 JingZhuangzhuang 提交于 4月 21, 2022
```
* update ampere sm

* update ampere sm

* update ampere sm
```
  c3b0b680
12 4月, 2022 1 次提交
- Z
  
  Adjusted CUDA Arches (#41628) · cade0018
  由 Zhanlue Yang 提交于 4月 12, 2022
  
  cade0018
22 3月, 2022 1 次提交
- Z
  
  Adjusted CUDA arches for NEW_RELEASE_ALL (#40660) · 71b813f0
  由 Zhanlue Yang 提交于 3月 22, 2022
  
  71b813f0
02 3月, 2022 1 次提交
- Z
  Adjust GPU Arches for next level Whl release strategy (#39910) · 3fc698fb
  由 Zhanlue Yang 提交于 3月 02, 2022
```
* Adjust GPU Arches for Whl releases

* Adjusted CUDA arches

* fixed minor issue

* adjusted gpu arches
```
  3fc698fb
11 2月, 2022 1 次提交
- Z
  
  get build time (#39368) · 72ad280b
  由 zhangchunle 提交于 2月 11, 2022
  
  72ad280b
31 8月, 2021 1 次提交

New whl release strategy with pruned nv_fatbin (#35239) · 2f3b393d

由 Zhanlue Yang 提交于 8月 31, 2021

[Background]
Expansion in code size can be irreversible in the long run, leading to huge release packages which
not only hampers user experience but also exceeds a hard limit of pypi.

In such, NV_FATBIN section takes up 86% of the compiled dylib size, owing to the vast number of GPU
arches supported.

This PR aims to prune this NV_FATBIN.

[Solution]
In the new release strategy, two types of whl packages will be involved:

Cubin PIP package:
PIP package maintains a smaller window for GPU arches support, containing
sm_60, sm_70, sm_75, sm_80 cubins, covering Pascal - Ampere arches

JIT release package:
This is a backup for Cubin PIP package, containing compute_35, compute_50, compute_60,
compute_70, compute_75, compute_80, with best performance and GPU arches coverage.

However, it takes around 10 min to install due to the JIT compilation.

[How to use]
The new release strategy is disabled by default.
To compile for Cubin PIP package, add this to cmake: -DCUBIN_RELEASE_PIP
To compile for JIT release package, add this to cmake: -DJIT_RELEASE_WHL

2f3b393d

14 7月, 2021 1 次提交
- Support sccache to speed up compilation on Windows (#34019) · 4ce66826
  由 zhouweiwei2014 提交于 7月 14, 2021
```
* Support sccache to speed up compilation on Windows

* Support sccache to speed up compilation on Windows
```
  4ce66826
06 7月, 2021 1 次提交

Add gpu implementation of shuffle_batch_op (#33938) · c6b6ba1f

由 Zeng Jinle 提交于 7月 06, 2021

* add gpu implementation of shuffle batch
test=develop

* add thrust cuda patches
test=develop

* fix macro guard

* fix shuffle batch compile on windows/hip

* fix hip compilation error

* refine CMakeLists.txt

* fix windows compile error

* try to fix windows CI compilation error

* fix windows compilation again

* fix shuffle_batch op test on Windows

c6b6ba1f

02 6月, 2021 1 次提交
- P
  
  fix jetson arch when compiling with single arch (#33269) · 29dc439a
  由 Pei Yang 提交于 6月 02, 2021
  
  29dc439a
26 5月, 2021 1 次提交
- Z
  Fix ninja compilation bug and warning on windows (#32987) · accf284b
  由 Zhou Wei 提交于 5月 26, 2021
```
* fix ninja compilation bug on windows

* polish windows ci

* polish windows ci
```
  accf284b
31 3月, 2021 2 次提交
- T
  
  delete cuda9 code (#31883) · ea738dda
  由 tianshuo78520a 提交于 3月 31, 2021
  
  ea738dda
- W
  update compilation with C++14 (#31815) · 587d99ae
  由 wuhuanzhou 提交于 3月 31, 2021
```
* update compilation with C++14, test=develop

* fix compilation error in eigen, test=develop
```
  587d99ae
30 3月, 2021 1 次提交
- Y
  
  Enhance cmake to support specifying CUDA_ARCH_NAME to Ampere. (#31923) · e50bc2c2
  由 Yiqun Liu 提交于 3月 30, 2021
  
  e50bc2c2
17 3月, 2021 1 次提交
- Z
  
  support Geforce RTX 30+ GPU (#31529) · 4c0c55bb
  由 Zhou Wei 提交于 3月 17, 2021
  
  4c0c55bb
19 2月, 2021 1 次提交
- W
  Modify relu native implementation 2 (#30996) · 615d8a22
  由 Wojciech Uss 提交于 2月 18, 2021
```
* Modify relu native implementation

* fix GPU performance
```
  615d8a22
14 1月, 2021 1 次提交
- Z
  
  Separate AVX and NO_AVX compilation, enhance installation error message (#30413) · c94a4b94
  由 Zhou Wei 提交于 1月 14, 2021
  
  c94a4b94
27 11月, 2020 1 次提交

detect tensorRT plugin fp16 in runtime (#27933) · b9e76a01

由 Shang Zhizhou 提交于 11月 27, 2020

* remove -DSUPPORTS_CUDA_FP16 in cuda.cmake

* comile with cuda9

* add some unittest

* notest;test=coverage

* add unittest for trt plugin swish && split

* update ernie unittest

* fix some error message

* remove repeated judgement of CUDA version in mbEltwiseLayerNormOpConverter

* fix comile errror when CUDA_ARCH_NAME < Pascal"

* fix comile error

* update unittest timeout

* compile with cuda9

* update error msg

* fix code style

* add some comments

* add define IF_CUDA_ARCH_SUPPORT_FP16

* rename IF_CUDA_ARCH_SUPPORT_FP16 to CUDA_ARCH_FP16_SUPPORTED

b9e76a01

21 10月, 2020 2 次提交
- Z
  
  fix Automatic GPU detection failed on windows (#28148) · 68c473e3
  由 Zhou Wei 提交于 10月 21, 2020
  
  68c473e3
- Z
  
  fix dynamic_loader more safe and error message on windows (#28117) · 5d700021
  由 Zhou Wei 提交于 10月 21, 2020
  
  5d700021
18 9月, 2020 1 次提交
- P
  
  Optimize emb_eltwise_layernorm_plugin and support fp16 (#27128) · a5ef246c
  由 Pei Yang 提交于 9月 18, 2020
  
  a5ef246c
09 9月, 2020 1 次提交
- W
  
  [cuda11 support] change the CMakeLists to support the cuda11 (#27124) · c71d79b1
  由 wangchaochaohu 提交于 9月 09, 2020
  
  c71d79b1
07 9月, 2020 1 次提交
- W
  
  update gpu compute archs for cuda11 (#27039) · 9b7692b1
  由 wangchaochaohu 提交于 9月 07, 2020
  
  9b7692b1
20 8月, 2020 1 次提交
- Z
  specify cuda arch when dectected fail (#26420) · 62bd7ba1
  由 Zhou Wei 提交于 8月 20, 2020
```
specify cuda arch when dectected fail
```
  62bd7ba1
10 8月, 2020 1 次提交
- Z
  fix compile warning on windows MSVC, fix paddle_build.bat more safe (#25933) · 1f74b94d
  由 Zhou Wei 提交于 8月 10, 2020
```
* Fixed compile warning about incorrect compile options,fix paddle_build.bat

* fix paddle_build.bat to more safe
```
  1f74b94d
09 7月, 2020 1 次提交
- C
  
  remove WITH_DSO compile option (#25444) · 172d4ecb
  由 Chen Weihang 提交于 7月 09, 2020
  
  172d4ecb
16 6月, 2020 1 次提交
- T
  
  don't support cmake 3.12, 3.13, 3.14 (#25021) · a73a4a8f
  由 T8T9 提交于 6月 16, 2020
  
  a73a4a8f
10 6月, 2020 1 次提交
- Z
  fix bug in CUDA_NVCC_FALS and CMAKE_CUDA_FLAGS, and eliminate some warning,test=develop (#24982) · 3e04ed22
  由 Zhou Wei 提交于 6月 10, 2020
```
fix bug in CUDA_NVCC_FALS and CMAKE_CUDA_FLAGS
```
  3e04ed22
08 6月, 2020 1 次提交

add -DPADDLE_CUDA_BINVER (#24928) · 90d420b1

由 T8T9 提交于 6月 08, 2020

* add -DPADDLE_CUDA_BINVER. test=develop, test=win_gpu

* nvcc will use add_compile_options, avoid using it if you don't want to pass arguments to nvcc. test=develop

* test=develop, test=win_gpu

90d420b1

05 6月, 2020 1 次提交

Builtin cuda (#24904) · 211ef78c

由 T8T9 提交于 6月 05, 2020

* support CUDA using cmake built-in way (#24395)

* support CUDA using cmake built-in way. test=develop

* test=develop

* cmake_minimum_required 3.10

* test=develop

211ef78c

28 5月, 2020 1 次提交
- Z
  
  fix windows bug that compile .cu files use MSVC dynamic C runtime (#24729) · 80ec2fe7
  由 Zhou Wei 提交于 5月 28, 2020
  
  80ec2fe7
13 5月, 2020 1 次提交
- S
  Revert "support CUDA using cmake built-in way (#24395). test=develop" (#24468) · 30efee33
  由 Shibo Tao 提交于 5月 13, 2020
```
This reverts commit 068d3690.
```
  30efee33
12 5月, 2020 1 次提交
- S
  support CUDA using cmake built-in way (#24395) · 068d3690
  由 Shibo Tao 提交于 5月 12, 2020
```
* support CUDA using cmake built-in way. test=develop

* test=develop
```
  068d3690

PaddlePaddle / Paddle 大约 1 年 前同步成功

PaddlePaddle / Paddle
大约 1 年前同步成功