提交 · 3e7825f375ce0a3e91d11979b883acfbfa7556f1 · PaddlePaddle / Paddle

15 2月, 2022 1 次提交

[PluggableDevice] Add custom runtime support (#38740) · 3e7825f3

由 ronnywang 提交于 2月 15, 2022

* [CustomRuntime] Add DeviceManager

* [CustomRuntime] Add DeviceInterface

* [CustomRuntime] Add Stream, Event, DeviceGuard, CallbackManager

* [CustomRuntime] Add plug-in device

* [CustomRuntime] Memory module support PluggableDevice

* [CustomRuntime] Add WITH_PLUGGABLE_DEVICE cmake option

* update

* [API] update API doc based on comments, test=develop
Co-authored-by: Nqili93 <qili93@qq.com>

3e7825f3

09 2月, 2022 1 次提交
- C
  
  move stream into pten (#39392) · 266955a9
  由 Chen Weihang 提交于 2月 09, 2022
  
  266955a9
08 2月, 2022 1 次提交

Support allocate CUDA managed memory (#39075) · 42910361

由 From00 提交于 2月 08, 2022

* Rough implementation for experiment

* Support allocate cuda managed memory

* Fix CI error

* Modify UT

* Check whether support memory oversubscription

* Fix ROCM Compile error

* Fix ROCM Compile error

* Fix UT cuda_managed_memory_test

* Set UT timeout to 40

* Add UT OOMExceptionTest

* Set UT timeout to 50

42910361

27 1月, 2022 1 次提交

[PTen]Support AllocateFrom in Tensor and Alloc/HostAlloc in Context (#39022) · 5631da9c

由 Aurelius84 提交于 1月 27, 2022

* Support allocate_from in Tensor and allocate_data in Context

* fix #ifdef CUDA

* fix cycle depends

* fix test_xxx_dev_api failed

* fix windows compiling error

* fix unittest

* modify into PImpl

* fix selected rows

* add TODO comment

* refine interface according reviewer

5631da9c

25 1月, 2022 1 次提交
- F
  
  Add GetBasePtr interface in paddle::memory (#39145) · b2a7261d
  由 From00 提交于 1月 25, 2022
  
  b2a7261d
17 1月, 2022 1 次提交

[Pten] Replace platform::Place to pten::Place. (#38899) · c48a9ad5

由 Wilber 提交于 1月 17, 2022

* add pten::Place data structure.

* update ci problem

* fix ci problem

* update

* using platform::Place=pten::Place

* remove BOOST_GET_CONST for CPUPlace and GPUPlace

* compile pass 25%.

* compile pass 45%

* compile pass 60%

* remove boost_get for xpu npu mlu and ipu

* compile pass on cpu and gpu.

* fix compile problem

* fix compile error.

* update

* fix ci problem

* update

* ci approve

* fix ci problem

* fix ci eager test problem

* remove BOOST_GET_CONST

* fix npu compile

c48a9ad5

13 1月, 2022 1 次提交
- 石
  
  splits allocation for pten, test=develop (#38853) · 277cf900
  由石晓伟提交于 1月 13, 2022
  
  277cf900
31 12月, 2021 1 次提交

[MLU]support calling mlu op from python interface (#38292) · b6bf650a

由 fwenguang 提交于 12月 31, 2021

* [MLU]support calling mlu op from python interface

* [MLU]fix

* fix

* [mlu]fix mlu_places

* [mlu]fix required mlu

* fix

* [MLU]fix tensor copy

* [mlu] fix MLUPlace call path

b6bf650a

28 12月, 2021 1 次提交

Utilize StreamSafeCUDAAllocator to support fast GC in new executor (#37642) · 0c7153a4

由 From00 提交于 12月 28, 2021

* fix reshape move storage error

* remove needless set type

* alloc tensor by shared storage

* Utilize StreamSafeCUDAAllocator to support fast GC in new executor

* Fix compile error for Windows and ROCm

* Fix compile error for Windows

* Modify UT stream_safe_cuda_alloc_test

* Modify UT stream_safe_cuda_alloc_test

* Rewrite fast GC

* Rewrite fast GC

* Fix compile error for BOOST_GET_CONST

* Fix compile error for BOOST_GET_CONST

* Changes default stream for StreamSafeCUDAAllocator

* Fix a small CI error

* Remove some redundant code

* Fix conflict

* Fix compile error for ROCm

* Fix Windoes CI error

* Fix CI error

* Remove some unnecessary code

* Fix CI error

* Add UT for fast GC

* Fix CI error

* add device-agnostic stream class

* add stream.h

* fix ut

* fix cpu compile

* Use RWLock in GetAllocator

* Fix CI error
Co-authored-by: NChen Weihang <chenweihang@baidu.com>
Co-authored-by: Nzhiqiu <chenqiuliang@baidu.com>

0c7153a4

27 12月, 2021 1 次提交
- L
  add device-agnostic stream class (#38391) · 6b5e33b4
  由 Leo Chen 提交于 12月 27, 2021
```
* add device-agnostic stream class

* add stream.h

* fix ut

* fix cpu compile
```
  6b5e33b4
20 12月, 2021 2 次提交
- F
  
  [MLU]add mlu backend (#38207) · 76514a1f
  由 fwenguang 提交于 12月 20, 2021
  
  76514a1f
- F
  
  Skip zero-size Allocation in RecordStream (#38264) · 48937020
  由 From00 提交于 12月 20, 2021
  
  48937020
17 12月, 2021 2 次提交

Get base pointer from Allocation (#37978) · 431a2d6a

由 From00 提交于 12月 17, 2021

* Get GPU BasePtr from CUDA allocation

* Fix compile error for ROCm

* Add BasePtr function for IPUPlace in naive_best_fit_allocator.cc

* Add alignment for BuddyAllocator

* Set address alignment of BuddyAllocator to 32 bytes

* Fix CI error

* Remove code for naive_best_fit strategy

431a2d6a

F

Add GetStream Interface for StreamSafeCUDAAllocator (#38195) · b0d12d99
由 From00 提交于 12月 17, 2021

b0d12d99

10 12月, 2021 1 次提交
- S
  
  make cuda graph thread local allocator (#37814) · 62b1f38c
  由 sneaxiy 提交于 12月 10, 2021
  
  62b1f38c
08 12月, 2021 1 次提交

Fix CUDAGraphAllocator bug for StreamSafeCUDAAllocator (#37821) · b4a67491

由 From00 提交于 12月 08, 2021

* Fix CUDAGraph bug for StreamSafeCUDAAllocator

* Add CUDAGrapthAllocator check in multi-stream interface

* Set FLAGS_use_stream_safe_cuda_allocator defaulted to false

* Fix environment error for cmake

* Fix cmake error

* Add UT of GetAllocatorInterfaceTest

* Add UT of CUDAGraphExceptionTest

* Enhance CUDAGraphExceptionTest

b4a67491

07 12月, 2021 1 次提交
- J
  
  add ipu device p1 (#37841) · c9a3c669
  由 jianghaicheng 提交于 12月 07, 2021
  
  c9a3c669
03 12月, 2021 1 次提交
- R
  refine structure for cuda and rocm (#37202) · a6d2fddb
  由 ronnywang 提交于 12月 03, 2021
```
* refine structure for cuda and rocm

* update

* update

* update

* update
```
  a6d2fddb
27 11月, 2021 1 次提交

[NPU] reorganization for device API abstraction (#37110) · 72241a6a

由 Aganlengzi 提交于 11月 27, 2021

* [NPU] reorganization for device API abstraction

* [NPU] delete old files

* [NPU] fix npu_collective_helper

* [NPU] fix collective_helper

* [NPU] fix ut

* [NPU] mod memory allocation and hccl_helper

* [NPU] fix place_type

* [NPU] split enfoce.h

* move acl* call into npu_info

* merge conflict

* fix merge

* merge conflict

* merge conflict

72241a6a

25 11月, 2021 1 次提交

Support multi-stream allocation for CUDA place (#37290) · b9c464c3

由 From00 提交于 11月 25, 2021

* Support multi-stream allocation for CUDA place

* Do not notify the retrying from other streams when free CUDA allocation

* Fix compile error for CPU

* Fix compile error for HIP

* Release memory for StreamSafeCUDAAllocaRetry in malloc_test

* Add FLAGS_use_stream_safe_cuda_allocator

* Fix CI error for 'set_tests_properties'

* Invalidate stream safe CUDA allocator for naive_best_fit and thread_local strategy

* Performance improvement: insert allocation pair to outstanding_events_map when free but not alloc; replace recursive_mutex with SpinLock

* FLAGS priority changes: FLAGS_use_system_allocator > FLAGS_use_stream_safe_cuda_allocator

* Performance improvement: directly delete allocation when the recorded_streams is empty in FreeImpl of StreamSafeCUDAAllocator

* Add UT for alloc interface

* Changes multi-stream interface; move retry code from AllocatorFacadePrivate to StreamSafeCUDAAllocator

b9c464c3

23 11月, 2021 1 次提交
- Q
  [XPU] Reorganize xpu device codes in platform, test=develop (#37428) · 79800978
  由 Qi Li 提交于 11月 23, 2021
```
* [XPU] Reorganize xpu device codes in platform, test=develop

* fix xpu_header.h, test=develop
```
  79800978
08 11月, 2021 1 次提交

Use cuda virtual memory management and merge blocks (#36189) · a1ec1d5a

由 wanghuancoder 提交于 11月 08, 2021

* Use cuda virtual memory management and merge blocks, test=develop

* refine, test=develop

* refine, test=develop

* refine, test=develop

* refine, test=develop

* refine, test=develop

* refine, test=develop

* refine, test=develop

* window dll, test=develop

* fix cuda error of CUDA_ERROR_NOT_INITIALIZED, test=develop

* use autogrowthv2 for system allocator, test=develop

* remove ~CUDAVirtualMemAllocator(), test=develop

* refine, test=develop

* fix cuda error of CUDA_ERROR_NOT_INITIALIZED, test=develop

* fix cuda error of CUDA_ERROR_NOT_INITIALIZED, test=develop

* fix bug, test=develop

* revert system allocator, test =develop

* revert multiprocessing, test=develop

* fix AutoGrowthBestFitAllocatorV2 mutxt, test=develop

* catch cudaErrorInitializationError when create allocator, test=develop

* fix cuMemSetAccess use, test=develop

* refine cuda api use, test=develop

* refine, test=develop

* for test, test=develop

* for test, test=develop

* switch to v2, test=develop

* refine virtual allocator, test=develop

* Record cuMemCreate and cuMemRelease, test=develop

* refine, test=develop

* avoid out of bounds, test=develop

* rename allocator, test=develop

* refine, test=develop

* use PADDLE_ENFORCE_CUDA_SUCCESS, test=develop

* for test,test=develop

* refine, test=develop

* refine, test=develop

* refine, test=develop

* refine, test=develop

* refine, test=develop

* refine, test=develop

a1ec1d5a

01 11月, 2021 1 次提交
- L
  [new-exec] refine vlog of interpretercore (#36865) · 4c93c4c3
  由 Leo Chen 提交于 11月 01, 2021
```
* refine vlog of interpretercore

* fix ut
```
  4c93c4c3
11 10月, 2021 1 次提交

refine auto_growth allocator (#35732) · 6d353aa5

由 Leo Chen 提交于 10月 11, 2021

* do not use alignedAllocator when cuda has alignment

* update test

* fix error during multiple process

6d353aa5

29 9月, 2021 1 次提交

Add basic support for CUDA Graph (#36190) · 21b93c3d

由 Zeng Jinle 提交于 9月 29, 2021

* add basic support for CUDA Graph

* fix ci compile error

* fix LOG print, fix windows CI

* follow comments and update

* small fix for default ctor

* fix rocm compile error

* fix CPU compile error

21b93c3d

17 9月, 2021 1 次提交

Make flag adding easier (#35823) · 2c781455

由 Zeng Jinle 提交于 9月 17, 2021

* make flag setter easier

* update

* rename macro name

* fix bug of public/writable

* update to pass CI

* polish

* fix CPU link error

2c781455

03 8月, 2021 1 次提交
- Q
  support Kunlun2 (#34459) · 2d0f3d9b
  由 QingshuChen 提交于 8月 03, 2021
```
* support Kunlun2

* support KL2

* support KL2
```
  2d0f3d9b
19 7月, 2021 1 次提交
- Q
  
  [NPU] add is_empty_op_npu, test=develop (#34234) · d4fb5c68
  由 Qi Li 提交于 7月 19, 2021
  
  d4fb5c68
12 5月, 2021 1 次提交
- L
  
  [NPU] Support npu pinned allocator and manage Tensor on NPUPinnedPlace (#32840) · 6b3bb796
  由 liym27 提交于 5月 12, 2021
  
  6b3bb796
09 4月, 2021 1 次提交

[NPU] cherry-pick basic NPU components/allocator/operator/executor supports from ascendrc (#32144) · ccf5709d

由 Leo Chen 提交于 4月 09, 2021

* [feature] support npu allocator (#30840)

[feature] support npu allocator

* [feature] support npu operator (#30951)

[feature] support npu operator

* [feature] support npu allocator, part 2 (#30972)

* support npu allocator

* add npu device context

* fix some compile problem

* fix some compile problem

* add npu info

* compile ok

* fix include dir

* support naive_best_fit_allocator

* run ut ok, bug failed to exit

* call aclrtResetDevice before exit

* fix aclFinilize

* add system allocatot test

* add selected_gpus in gtest

* add tensor_test for npu

* support npu op, initial commit

* add npu stream

* add elementwise_add_op

* compile ok

* fix typo

* fix elementwise_add_op_npu_test

* support op run

* test can run but failed

* change aclopExecuteV2 to aclopCompileAndExecute

* support parsing ascend rank table file (#31000)

support parsing ascend rank table file

* Fix reshape on GE graph. (#31084)

Fix reshape on GE graph

* add npu kernel for elementwise_sub and elementwise_sub_grad (#30973)

* add npu sub op

* fix typo

* rename test

* fix bug

* fix bug

* add fp16 kernel

* fix typo

* support sub grad op

* support elementwise_sub_grad op
Co-authored-by: Nfrankwhzhang <frankwhzhang@126.com>

* Fix compilation problem (#31100)

Fix compilation problem (#31100)

* fix compile

* fix code stype

* remove const_cast

* support adding correct npu op in pybind.h (#31143)

* support adding correct npu op in pybind.h

* refine code

* [NPU] Support executor with NPU (#31057)

* [NPU] Support executor with NPU

* Fix code according to reviews

* Fix code

* Add unittest for sub op npu

* refactor npu device manager (#31154)

refactor npu device manager (#31154)

* fix selected npus

* fix compile

* fix reading flags from env

* format
Co-authored-by: Nxiayanming <41795079@qq.com>
Co-authored-by: Ngongweibao <weibao.gong@gmail.com>
Co-authored-by: Nfrankwhzhang <frankwhzhang@126.com>
Co-authored-by: Nliym27 <33742067+liym27@users.noreply.github.com>

ccf5709d

04 2月, 2021 1 次提交
- W
  use iwyu clean include second time, test=develop (#30829) · 35c5b23f
  由 wanghuancoder 提交于 2月 04, 2021
```
* use iwyu clean include second time, test=develop
```
  35c5b23f
01 2月, 2021 1 次提交
- Q
  
  [ROCM] update fluid memory for rocm35 (part1), test=develop (#30758) · 69875dc4
  由 Qi Li 提交于 2月 01, 2021
  
  69875dc4
12 1月, 2021 1 次提交
- 石
  
  fix header file paths of gflags, commit 3, test=develop (#30273) · efa54629
  由石晓伟提交于 1月 12, 2021
  
  efa54629
20 11月, 2020 1 次提交
- G
  
  Fix gpu memory allocation bug. (#28703) · 1dad8cea
  由 gongweibao 提交于 11月 20, 2020
  
  1dad8cea
06 11月, 2020 1 次提交
- W
  
  Update memory release interface. (#28456) · ced5c40c
  由 Wilber 提交于 11月 05, 2020
  
  ced5c40c
04 11月, 2020 1 次提交
- W
  
  [Inference] Memory modification for ShrinkMemory. (#28355) · 05114693
  由 Wilber 提交于 11月 04, 2020
  
  05114693
21 8月, 2020 1 次提交

support Baidu Kunlun AI Accelerator (#25959) · 138ecf24

由 QingshuChen 提交于 8月 21, 2020

* support Baidu AI Accelerator
  * test=kunlun

* minor
 * test=kunlun

* support xpu op in separate file
 * test=kunlun

* update XPU error message and remove duplicated code

 * test=kunlun

* minor
 * test=kunlun

* minor
 * test=kunlun

138ecf24

24 6月, 2020 1 次提交

Refine error message in memory folder (#25095) · ff5be2fb

由 Leo Chen 提交于 6月 24, 2020

* refine PADDLE_THROW, test=develop

* refine error msg, test=develop

* refine cuda error, test=develop

* follow comments, test=develop

* fix compile problem, test=develop

* fix bug, test=develop

ff5be2fb

21 4月, 2020 1 次提交

石

New feature: thread local allocator, test=develop (#23989) · d2584a70

由石晓伟提交于 4月 21, 2020

* add the thread_local_allocator, test=develop

* refactor the thread_local_allocator, test=develop

* provides option setting strategy, test=develop

d2584a70

28 11月, 2019 1 次提交

Use system allocator in OpTest (#21335) · 09696d5d

由 Zeng Jinle 提交于 11月 28, 2019

* use system allocator in unittests, test=develop

* fix op bugs, test=develop

* fix tensor copy bug when src and dst are the same, test=develop

09696d5d

PaddlePaddle / Paddle 大约 1 年 前同步成功

PaddlePaddle / Paddle
大约 1 年前同步成功