提交 · a1ec1d5a4933a8242c2b1b14deafd449de506e26 · BaiXuePrincess / Paddle

08 11月, 2021 1 次提交

Use cuda virtual memory management and merge blocks (#36189) · a1ec1d5a

由 wanghuancoder 提交于 11月 08, 2021

* Use cuda virtual memory management and merge blocks, test=develop

* refine, test=develop

* refine, test=develop

* refine, test=develop

* refine, test=develop

* refine, test=develop

* refine, test=develop

* refine, test=develop

* window dll, test=develop

* fix cuda error of CUDA_ERROR_NOT_INITIALIZED, test=develop

* use autogrowthv2 for system allocator, test=develop

* remove ~CUDAVirtualMemAllocator(), test=develop

* refine, test=develop

* fix cuda error of CUDA_ERROR_NOT_INITIALIZED, test=develop

* fix cuda error of CUDA_ERROR_NOT_INITIALIZED, test=develop

* fix bug, test=develop

* revert system allocator, test =develop

* revert multiprocessing, test=develop

* fix AutoGrowthBestFitAllocatorV2 mutxt, test=develop

* catch cudaErrorInitializationError when create allocator, test=develop

* fix cuMemSetAccess use, test=develop

* refine cuda api use, test=develop

* refine, test=develop

* for test, test=develop

* for test, test=develop

* switch to v2, test=develop

* refine virtual allocator, test=develop

* Record cuMemCreate and cuMemRelease, test=develop

* refine, test=develop

* avoid out of bounds, test=develop

* rename allocator, test=develop

* refine, test=develop

* use PADDLE_ENFORCE_CUDA_SUCCESS, test=develop

* for test,test=develop

* refine, test=develop

* refine, test=develop

* refine, test=develop

* refine, test=develop

* refine, test=develop

* refine, test=develop

a1ec1d5a

28 9月, 2021 1 次提交

Add paddle.device.cuda.get_device_properties (#35661) · 4cbed9e5

由 Yanxing Shi 提交于 9月 28, 2021

* Initial Commit

* add unittest and add error information

* modify doc

* fix some error

* fix some word

* fix bug cudaDeviceProp* and modify error explanation

* fix cudaDeviceProp* error and unnitest samples

* fix hip error and PADDLE_WITH_HIP

* update style

* fix error is_compiled_with_cuda

* fix paddle.device.cuda.get_device_properties

* fix error for multi thread safe

* update style

* merge conflict

* modify after mentor review

* update style

* delete word

* fix unittest error for windows

* support string input and modify some code

* modify doc to support string input

* fix error for express information

* fix error for express information

* fix unnitest for windows

* fix device.startswith('gpu:')

* format error and doc

* fix after review

* format code

* fix error for doc compile

* fix error for doc compile

* fix error for doc compile

* fix error for doc compile

* fix error for doc compile

* fix py2 error

* fix wrong words and doc

* fix _gpuDeviceProperties

4cbed9e5

14 9月, 2021 1 次提交

Add api paddle.device.cuda.empty_cache to release idle gpu memory hold by allocator。 (#35427) · 83932715

由 chenenquan 提交于 9月 14, 2021

* Add empty_cache api to release idle gpu memory hold by allocator,test=develop

* Add empty_cache api to release idle gpu memory hold by allocator,test=develop

* Add empty_cache api to release idle gpu memory hold by allocator,test=develop

* Fix test coverage problem for empty_cache

* delete redundant check for empty_cache

* fix the problem of empty_cache's doc

* delete the nvidia-smi comment in doc of empty_cache, test=document_fix

83932715

07 2月, 2021 1 次提交
- Q
  
  [ROCM] update fluid platform for rocm39 (part2), test=develop (#30774) · 34f1628c
  由 Qi Li 提交于 2月 07, 2021
  
  34f1628c
16 8月, 2020 1 次提交
- W
  
  [API2.0] add op for cudnn version query test=develop (#26180) · 0b81d763
  由 wangchaochaohu 提交于 8月 16, 2020
  
  0b81d763
04 3月, 2020 1 次提交

Add flags to limit gpu memory (#22793) · d41d802b

由 Zeng Jinle 提交于 3月 04, 2020

* add recorded cuda memory apis, fix typo, test=develop

* add more ut, test=develop

* follow comments, test=develop

* fix py35 incompatible issues, test=develop

d41d802b

09 1月, 2020 1 次提交
- 石
  
  [Feature] Lite subgraph (#22114) · ad0dfb17
  由石晓伟提交于 1月 09, 2020
  
  ad0dfb17
08 1月, 2020 1 次提交

Refine stack op to improve xlnet performance, test=develop (#22142) · 3d4f2aa6

由 zhaoyuchen2018 提交于 1月 08, 2020

stack's wait cost a lot of cpu time, use cuda kernel to do memory copy
will reduce cpu time.
Signed-off-by: Nzhaoyuchen <zhaoyuchen01@baidu.com>

3d4f2aa6

14 11月, 2019 1 次提交

Improve topk performance. (#21087) · b93870e6

由 zhaoyuchen2018 提交于 11月 13, 2019

* Improve topk performance.

give 200000 data to compute topk,
before opt: cost 1s
after opt: cost 0.0028s.

* Refine return value.
* Add cuda util funtions.
* Fix ComputeBlockSize bug & refine comments.
Signed-off-by: Nzhaoyuchen <zhaoyuchen01@baidu.com>

b93870e6

06 11月, 2019 1 次提交
- Z
  
  refine error message of allocator again, test=develop (#21023) · a710ccc0
  由 Zeng Jinle 提交于 11月 06, 2019
  
  a710ccc0
31 7月, 2019 1 次提交
- H
  GPU allocation uses fraction of available memory (#18896) · ea6ee76f
  由 Huihuang Zheng 提交于 7月 31, 2019
```
GPU allocation uses fraction of available memory, also fix the GetUsed without lock
```
  ea6ee76f
21 3月, 2019 1 次提交

add more unittest · 953214ad

由 sneaxiy 提交于 3月 19, 2019

modify allocator strategy
remove changes of legacy buddy_allocator
test=develop

953214ad

19 3月, 2019 1 次提交
- Z
  add allocator flags · 22715487
  由 zhhsplendid 提交于 3月 19, 2019
```
test=develop
```
  22715487
04 12月, 2018 1 次提交

[Feature] multi process multi gpu dist training, boost v100 performance by 20% (#14661) · 29d9fb53

由 Wu Yi 提交于 12月 04, 2018

* wip multi process multi gpu dist training

* workable for p2p

* update test=develop

* change back env name test=develop

* fix alloc init

* fix cpu build test=devlop

* fix mac tests test=develop

* refine code

* refine test=develop

29d9fb53

22 11月, 2018 1 次提交

Refine cublas to support CUBLAS_TENSOR_OP_MATH (#13929) · 00b9e9a1

由 chengduo 提交于 11月 22, 2018

* refine cublase
test=develop

* code refine

* refine cublas

* add GEMME_EX

* add enable_cublas_tensor_op_math doc and add cublasCall
test=develop

* fix CublasCall for cuda version
test=develop

* fix error
test=develop

* fix GEMM_EX to be compatible with gcc 4.8
test=develop

* add GEMM_EX
test=develop

* to compatiable with gcc4.8
test=develop

00b9e9a1

15 10月, 2018 1 次提交
- C
  add cuda version display (#13885) · 2c9839c8
  由 chengduo 提交于 10月 15, 2018
```
test=develop
```
  2c9839c8
27 9月, 2018 1 次提交
- T
  Revert "Some trivial optimization (#13530)" · a4f7696a
  由 typhoonzero 提交于 9月 27, 2018
```
This reverts commit 1d91a49d.
```
  a4f7696a
26 9月, 2018 1 次提交

Some trivial optimization (#13530) · 1d91a49d

由 chengduo 提交于 9月 26, 2018

* some trivial opt

* remove the fix of lod_tensor and shrink_rnn_memory_op

* refine ShrinkRNNMemoryOp

test=develop

1d91a49d

23 4月, 2018 1 次提交
- F
  
  Add synchronous TensorCopy and use it in double buffer · 9f11da59
  由 fengjiayi 提交于 4月 23, 2018
  
  9f11da59
08 4月, 2018 2 次提交
- Y
  
  Update (#9717) · 535646cf
  由 Yi Wang 提交于 4月 07, 2018
  
  535646cf
- Y
  Fix cpplint errors with paddle/fluid/platform/gpu_info.* (#9710) · 0c43a376
  由 Yi Wang 提交于 4月 07, 2018
```
* Fix cpplint errors with paddle/fluid/platform/gpu_info.*

* Update
```
  0c43a376
10 3月, 2018 1 次提交
- K
  
  add gpu info func to get compute cap · 1998d5af
  由 Kexin Zhao 提交于 3月 09, 2018
  
  1998d5af
03 3月, 2018 1 次提交
- C
  
  get max threads of GPU · 00e596ed
  由 chengduoZH 提交于 3月 02, 2018
  
  00e596ed
12 2月, 2018 1 次提交
- Q
  
  Fix the grammar in copyright. (#8403) · 24509f4a
  由 qingqing01 提交于 2月 12, 2018
  
  24509f4a
10 2月, 2018 1 次提交
- Y
  
  Move file to fluid/; Edit CMakeLists.txt · 90648f33
  由 Yi Wang 提交于 2月 09, 2018
  
  90648f33
22 12月, 2017 1 次提交

"remove GPU Sync Interface" (#6793) · abde3130

由 dzhwinter 提交于 12月 22, 2017

* "remove GPU Sync Interface"

* "fix typo"

* "fix type cast error"

* "fix related Copy with stream"

* "fix failed tests with DevicePool"

* "fix stupid removed position error"

abde3130

16 11月, 2017 1 次提交
- D
  "fix accuracy kernel bug" (#5673) · e97b8987
  由 dzhwinter 提交于 11月 15, 2017
```
* "fix accuracy kernel bug"

* "relauch ci"
```
  e97b8987
10 10月, 2017 1 次提交
- L
  
  remove unused PADDLE_ONLY_CPU comment · 871a3f6e
  由 Luo Tao 提交于 10月 10, 2017
  
  871a3f6e
05 10月, 2017 3 次提交

Y

Rename platform::GetDeviceCount into platform::GetCUDADeviceCount · 2b204f04
由 Yi Wang 提交于 10月 04, 2017

2b204f04
Y

Use PADDLE_WITH_CUDA instead of PADDLE_WITH_GPU · 4558807c
由 Yi Wang 提交于 10月 04, 2017

4558807c

Change `PADDLE_ONLY_CPU` to `PADDLE_WITH_GPU` · 84500f94

由 Yu Yang 提交于 10月 04, 2017

By shell command

```bash
sed -i 's#ifdef PADDLE_ONLY_CPU#ifndef PADDLE_WITH_GPU#g' `find ./paddle/ -name '*.h' -o -name '*.cc' -o -name '*.cpp' -o -name '*.c' -o -name '*.cu'`
sed -i 's#ifndef PADDLE_ONLY_CPU#ifdef PADDLE_WITH_GPU#g' `find ./paddle/ -name '*.h' -o -name '*.cc' -o -name '*.cpp' -o -name '*.c' -o -name '*.cu'`
```

84500f94

26 9月, 2017 1 次提交
- Q
  fix nv_library (#4370) · d0ad82cf
  由 Qiao Longfei 提交于 9月 25, 2017
```
* fix nv_library

* fix symbol in gpu_info.h
```
  d0ad82cf
18 8月, 2017 2 次提交
- L
  
  follow comments · b3ab15a7
  由 liaogang 提交于 8月 18, 2017
  
  b3ab15a7
- L
  
  Add ENVIRONMENT interface interface · 55437b58
  由 liaogang 提交于 8月 18, 2017
  
  55437b58
19 7月, 2017 1 次提交
- L
  
  Add cuda memcpy in gpu_info · b0588641
  由 liaogang 提交于 7月 19, 2017
  
  b0588641
11 7月, 2017 1 次提交
- L
  
  FIX: merge conflicts · 383b96f3
  由 liaogang 提交于 7月 11, 2017
  
  383b96f3
04 7月, 2017 1 次提交
- L
  
  ENH: Add buddy allocator Free · 0ba63475
  由 liaogang 提交于 7月 04, 2017
  
  0ba63475
29 6月, 2017 2 次提交
- L
  
  ENH: Add gpu info interface · 6e7209f0
  由 liaogang 提交于 6月 29, 2017
  
  6e7209f0
- L
  
  ENH: Add Gpu info · d3b77a5b
  由 liaogang 提交于 6月 29, 2017
  
  d3b77a5b
28 6月, 2017 1 次提交
- L
  
  ENH: clang-format · 9490d243
  由 liaogang 提交于 6月 28, 2017
  
  9490d243

BaiXuePrincess / Paddle 与 Fork 源项目一致

BaiXuePrincess / Paddle
与 Fork 源项目一致