提交 · f9591bb172e7274a77bfdcb6493579824aec8b47 · PaddlePaddle / Paddle

08 10月, 2021 1 次提交

Support CUDA Graph on ParallelExecutor (#36250) · f9591bb1

由 Zeng Jinle 提交于 10月 08, 2021

* support CUDA Graph on PE

* add ut, fix CI compile

* reduce memory consumption

* fix CUDA 10 CI

* improve coverage

* improve python coverage

f9591bb1

29 9月, 2021 1 次提交

Add basic support for CUDA Graph (#36190) · 21b93c3d

由 Zeng Jinle 提交于 9月 29, 2021

* add basic support for CUDA Graph

* fix ci compile error

* fix LOG print, fix windows CI

* follow comments and update

* small fix for default ctor

* fix rocm compile error

* fix CPU compile error

21b93c3d

28 9月, 2021 1 次提交

Add paddle.device.cuda.get_device_properties (#35661) · 4cbed9e5

由 Yanxing Shi 提交于 9月 28, 2021

* Initial Commit

* add unittest and add error information

* modify doc

* fix some error

* fix some word

* fix bug cudaDeviceProp* and modify error explanation

* fix cudaDeviceProp* error and unnitest samples

* fix hip error and PADDLE_WITH_HIP

* update style

* fix error is_compiled_with_cuda

* fix paddle.device.cuda.get_device_properties

* fix error for multi thread safe

* update style

* merge conflict

* modify after mentor review

* update style

* delete word

* fix unittest error for windows

* support string input and modify some code

* modify doc to support string input

* fix error for express information

* fix error for express information

* fix unnitest for windows

* fix device.startswith('gpu:')

* format error and doc

* fix after review

* format code

* fix error for doc compile

* fix error for doc compile

* fix error for doc compile

* fix error for doc compile

* fix error for doc compile

* fix py2 error

* fix wrong words and doc

* fix _gpuDeviceProperties

4cbed9e5

14 9月, 2021 1 次提交

Add api paddle.device.cuda.empty_cache to release idle gpu memory hold by allocator。 (#35427) · 83932715

由 chenenquan 提交于 9月 14, 2021

* Add empty_cache api to release idle gpu memory hold by allocator,test=develop

* Add empty_cache api to release idle gpu memory hold by allocator,test=develop

* Add empty_cache api to release idle gpu memory hold by allocator,test=develop

* Fix test coverage problem for empty_cache

* delete redundant check for empty_cache

* fix the problem of empty_cache's doc

* delete the nvidia-smi comment in doc of empty_cache, test=document_fix

83932715

31 8月, 2021 1 次提交

Support CostInfo and MemProfiler in InterpreterCore (#34981) · 572bad8a

由 Aurelius84 提交于 8月 31, 2021

* polish code

* fix unittest on windows

* refine pybind interface

* support statistic MemSize of AllocatorPool

* Replace mutex into atomic

572bad8a

09 4月, 2021 1 次提交

[NPU] cherry-pick basic NPU components/allocator/operator/executor supports from ascendrc (#32144) · ccf5709d

由 Leo Chen 提交于 4月 09, 2021

* [feature] support npu allocator (#30840)

[feature] support npu allocator

* [feature] support npu operator (#30951)

[feature] support npu operator

* [feature] support npu allocator, part 2 (#30972)

* support npu allocator

* add npu device context

* fix some compile problem

* fix some compile problem

* add npu info

* compile ok

* fix include dir

* support naive_best_fit_allocator

* run ut ok, bug failed to exit

* call aclrtResetDevice before exit

* fix aclFinilize

* add system allocatot test

* add selected_gpus in gtest

* add tensor_test for npu

* support npu op, initial commit

* add npu stream

* add elementwise_add_op

* compile ok

* fix typo

* fix elementwise_add_op_npu_test

* support op run

* test can run but failed

* change aclopExecuteV2 to aclopCompileAndExecute

* support parsing ascend rank table file (#31000)

support parsing ascend rank table file

* Fix reshape on GE graph. (#31084)

Fix reshape on GE graph

* add npu kernel for elementwise_sub and elementwise_sub_grad (#30973)

* add npu sub op

* fix typo

* rename test

* fix bug

* fix bug

* add fp16 kernel

* fix typo

* support sub grad op

* support elementwise_sub_grad op
Co-authored-by: Nfrankwhzhang <frankwhzhang@126.com>

* Fix compilation problem (#31100)

Fix compilation problem (#31100)

* fix compile

* fix code stype

* remove const_cast

* support adding correct npu op in pybind.h (#31143)

* support adding correct npu op in pybind.h

* refine code

* [NPU] Support executor with NPU (#31057)

* [NPU] Support executor with NPU

* Fix code according to reviews

* Fix code

* Add unittest for sub op npu

* refactor npu device manager (#31154)

refactor npu device manager (#31154)

* fix selected npus

* fix compile

* fix reading flags from env

* format
Co-authored-by: Nxiayanming <41795079@qq.com>
Co-authored-by: Ngongweibao <weibao.gong@gmail.com>
Co-authored-by: Nfrankwhzhang <frankwhzhang@126.com>
Co-authored-by: Nliym27 <33742067+liym27@users.noreply.github.com>

ccf5709d

07 2月, 2021 1 次提交
- Q
  
  [ROCM] update fluid platform for rocm39 (part2), test=develop (#30774) · 34f1628c
  由 Qi Li 提交于 2月 07, 2021
  
  34f1628c
04 2月, 2021 1 次提交
- W
  use iwyu clean include second time, test=develop (#30829) · 35c5b23f
  由 wanghuancoder 提交于 2月 04, 2021
```
* use iwyu clean include second time, test=develop
```
  35c5b23f
19 1月, 2021 1 次提交
- L
  unify calling cudaSetDevice (#30470) · 81217a94
  由 Leo Chen 提交于 1月 19, 2021
```
* unify calling cudaSetDevice

* fix compile
```
  81217a94
13 11月, 2020 1 次提交
- Z
  
  fix user set CUDA_VISIBLE_DEVICES start/end with quotation marks (#28547) · 849467b5
  由 Zhou Wei 提交于 11月 13, 2020
  
  849467b5
16 8月, 2020 1 次提交
- W
  
  [API2.0] add op for cudnn version query test=develop (#26180) · 0b81d763
  由 wangchaochaohu 提交于 8月 16, 2020
  
  0b81d763
15 7月, 2020 1 次提交
- G
  refine PADDLE_ENFORCE (#25456) · c10dcff1
  由 GaoWei8 提交于 7月 15, 2020
```
* Refine PADDLE_ENFORCE in paddle/fluid/platform
test=develop
```
  c10dcff1
07 7月, 2020 1 次提交
- G
  Refine PADDLE_ENFORCE (#25369) · ea7e5325
  由 GaoWei8 提交于 7月 07, 2020
```
* refine PADDLE_ENFORCE
test=develop
```
  ea7e5325
16 6月, 2020 1 次提交

Monitor Framework (#24079) · 5822862d

由 hutuxian 提交于 6月 16, 2020

* Add a StatValue class in the backend to represent a stat.
* Add a singleton StatRegistry to maintain the collection of stats.
* For the sake of code neatness, we only support type of int and float, which can cover most of the scenarios.

5822862d

20 4月, 2020 1 次提交

Optimize the error messages of paddle CUDA API (#23816) · 78170037

由 Zhou Wei 提交于 4月 20, 2020

* Optimize the error messages of paddle CUDA API, test=develop

* fix the error messages of paddle CUDA API, test=develop

* Refactoring PADDLE_ENFORCE_CUDA_SUCCESS, and apply to curand/cudnn/cublas/NCCL,test=develop

* remove build_ex_string,test=develop

* merge conflict,test=develop

78170037

04 3月, 2020 1 次提交

Add flags to limit gpu memory (#22793) · d41d802b

由 Zeng Jinle 提交于 3月 04, 2020

* add recorded cuda memory apis, fix typo, test=develop

* add more ut, test=develop

* follow comments, test=develop

* fix py35 incompatible issues, test=develop

d41d802b

09 1月, 2020 1 次提交
- 石
  
  [Feature] Lite subgraph (#22114) · ad0dfb17
  由石晓伟提交于 1月 09, 2020
  
  ad0dfb17
08 1月, 2020 1 次提交

Refine stack op to improve xlnet performance, test=develop (#22142) · 3d4f2aa6

由 zhaoyuchen2018 提交于 1月 08, 2020

stack's wait cost a lot of cpu time, use cuda kernel to do memory copy
will reduce cpu time.
Signed-off-by: Nzhaoyuchen <zhaoyuchen01@baidu.com>

3d4f2aa6

14 11月, 2019 1 次提交

Improve topk performance. (#21087) · b93870e6

由 zhaoyuchen2018 提交于 11月 13, 2019

* Improve topk performance.

give 200000 data to compute topk,
before opt: cost 1s
after opt: cost 0.0028s.

* Refine return value.
* Add cuda util funtions.
* Fix ComputeBlockSize bug & refine comments.
Signed-off-by: Nzhaoyuchen <zhaoyuchen01@baidu.com>

b93870e6

06 11月, 2019 1 次提交
- Z
  
  refine error message of allocator again, test=develop (#21023) · a710ccc0
  由 Zeng Jinle 提交于 11月 06, 2019
  
  a710ccc0
01 11月, 2019 1 次提交
- W
  
  gpu info query refine test=develop (#20904) · 7695b713
  由 wangchaochaohu 提交于 11月 01, 2019
  
  7695b713
12 10月, 2019 1 次提交
- W
  enable cpu machine to run paddle in gpu lib · 751812a6
  由 Wilber 提交于 10月 12, 2019
```
enable cpu machine to run paddle model in gpu lib
```
  751812a6
16 8月, 2019 1 次提交
- Z
  
  move_flags_to_unified_files_for_management, test=develop (#19224) · 708bd979
  由 Zeng Jinle 提交于 8月 16, 2019
  
  708bd979
01 8月, 2019 1 次提交

Fix gpu_info PADDLE_ENFORCE_GT when fraction_of_gpu_memory_to_use=1.0 (#18950) · 08fa98f7

由 Zeng Jinle 提交于 8月 01, 2019

* fix gpu_info, test=develop

* fix reserving gpu memory calculation bug, add fraction=1 unittest, test=develop

* fix bug again for reserving size, test=develop

08fa98f7

31 7月, 2019 1 次提交
- H
  GPU allocation uses fraction of available memory (#18896) · ea6ee76f
  由 Huihuang Zheng 提交于 7月 31, 2019
```
GPU allocation uses fraction of available memory, also fix the GetUsed without lock
```
  ea6ee76f
18 7月, 2019 1 次提交

Optimize the content of error reporting information, print error code and... · 772e0956

由 zhouwei25 提交于 7月 18, 2019

Optimize the content of error reporting information, print error code and official document web sites (#18671)

optimize the error reporting information of cuda related API
index on develop: 130ac177 Merge branch 'develop' of https://github.com/PaddlePaddle/Paddle into develop

772e0956

16 7月, 2019 1 次提交
- L
  
  print out error code of cudaGetDeviceProperties if failed (#18643) · 75953096
  由 liuwei1031 提交于 7月 16, 2019
  
  75953096
30 4月, 2019 1 次提交
- H
  Fix a typo in gpu_info.cc (#17175) · e4a53324
  由 Huihuang Zheng 提交于 4月 30, 2019
```
test=develop
```
  e4a53324
21 3月, 2019 1 次提交

add more unittest · 953214ad

由 sneaxiy 提交于 3月 19, 2019

modify allocator strategy
remove changes of legacy buddy_allocator
test=develop

953214ad

19 3月, 2019 1 次提交
- Z
  add allocator flags · 22715487
  由 zhhsplendid 提交于 3月 19, 2019
```
test=develop
```
  22715487
24 1月, 2019 2 次提交

Add the CUDA kernel for beam_search op (#15020) · 3008fa12

由 Yiqun Liu 提交于 1月 24, 2019

* Refine the beam_search op and test.

* A basic CUDA implementation of beam_search for small batch_size.

* Implement CUDA kernel for beam_search_op.

* Use multiple CUDA threads in the same block to select the top beam.

* Update the python api of beam_search op.

* Enable extend function in CPU kernel of beam_search op.

* Unify the CUDA codes.
test=develop

* Unify the CPU kernel of beam_search op.

* Ensure the seletced items of beam_search_op's CPU kernel sorted by scores.

* Update the description of beam_search in API.spec.

* Enable the use of CUDA kernel in beam_search op.

* Exclude the beam_search's CUDA unittest when there is no CUDA gpu, and delete some debuging statements.
test=develop

* Follow comments.
test=develop

* Call the CPU kernel for beam_search op when batch_size > 4.
test=develop

* Remove the except of is_empty op in PrepareData.
test=develop

3008fa12

S
lazy_allocator · 51227bd4
由 sneaxiy 提交于 1月 23, 2019
```
test=develop
```
51227bd4

04 12月, 2018 1 次提交

[Feature] multi process multi gpu dist training, boost v100 performance by 20% (#14661) · 29d9fb53

由 Wu Yi 提交于 12月 04, 2018

* wip multi process multi gpu dist training

* workable for p2p

* update test=develop

* change back env name test=develop

* fix alloc init

* fix cpu build test=devlop

* fix mac tests test=develop

* refine code

* refine test=develop

29d9fb53

27 11月, 2018 1 次提交
- P
  
  minor fix · 38715e6f
  由 peizhilin 提交于 11月 27, 2018
  
  38715e6f
26 11月, 2018 2 次提交
- M
  Revert the changes of VLOG · 53433d7f
  由 minqiyang 提交于 11月 26, 2018
```
test=develop
```
  53433d7f
- P
  
  Given the different fraction_of_gpu_memory_to_use depends on platform · b2f8d418
  由 peizhilin 提交于 11月 26, 2018
  
  b2f8d418
22 11月, 2018 2 次提交

Refine cublas to support CUBLAS_TENSOR_OP_MATH (#13929) · 00b9e9a1

由 chengduo 提交于 11月 22, 2018

* refine cublase
test=develop

* code refine

* refine cublas

* add GEMME_EX

* add enable_cublas_tensor_op_math doc and add cublasCall
test=develop

* fix CublasCall for cuda version
test=develop

* fix error
test=develop

* fix GEMM_EX to be compatible with gcc 4.8
test=develop

* add GEMM_EX
test=develop

* to compatiable with gcc4.8
test=develop

00b9e9a1

P

fix unit test cases · 7c8c9dc9
由 peizhilin 提交于 11月 22, 2018

7c8c9dc9

08 11月, 2018 1 次提交
- M
  Change the origin VLOG level to 10 times · 0c3227a5
  由 minqiyang 提交于 11月 08, 2018
```
Fix code to support cpplint syntax check

test=develop
```
  0c3227a5
15 10月, 2018 1 次提交
- C
  add cuda version display (#13885) · 2c9839c8
  由 chengduo 提交于 10月 15, 2018
```
test=develop
```
  2c9839c8

PaddlePaddle / Paddle 大约 1 年 前同步成功

PaddlePaddle / Paddle
大约 1 年前同步成功