提交 · 3e7825f375ce0a3e91d11979b883acfbfa7556f1 · PaddlePaddle / Paddle

15 2月, 2022 1 次提交

[PluggableDevice] Add custom runtime support (#38740) · 3e7825f3

由 ronnywang 提交于 2月 15, 2022

* [CustomRuntime] Add DeviceManager

* [CustomRuntime] Add DeviceInterface

* [CustomRuntime] Add Stream, Event, DeviceGuard, CallbackManager

* [CustomRuntime] Add plug-in device

* [CustomRuntime] Memory module support PluggableDevice

* [CustomRuntime] Add WITH_PLUGGABLE_DEVICE cmake option

* update

* [API] update API doc based on comments, test=develop
Co-authored-by: Nqili93 <qili93@qq.com>

3e7825f3

08 2月, 2022 1 次提交

Support allocate CUDA managed memory (#39075) · 42910361

由 From00 提交于 2月 08, 2022

* Rough implementation for experiment

* Support allocate cuda managed memory

* Fix CI error

* Modify UT

* Check whether support memory oversubscription

* Fix ROCM Compile error

* Fix ROCM Compile error

* Fix UT cuda_managed_memory_test

* Set UT timeout to 40

* Add UT OOMExceptionTest

* Set UT timeout to 50

42910361

25 1月, 2022 1 次提交
- F
  
  Add GetBasePtr interface in paddle::memory (#39145) · b2a7261d
  由 From00 提交于 1月 25, 2022
  
  b2a7261d
17 12月, 2021 3 次提交
- L
  
  fit CI_SKIP_CPP_TEST (#38242) · b613c31e
  由 Leo Chen 提交于 12月 17, 2021
  
  b613c31e
- F
  Get base pointer from Allocation (#37978) · 431a2d6a
  由 From00 提交于 12月 17, 2021
```
* Get GPU BasePtr from CUDA allocation

* Fix compile error for ROCm

* Add BasePtr function for IPUPlace in naive_best_fit_allocator.cc

* Add alignment for BuddyAllocator

* Set address alignment of BuddyAllocator to 32 bytes

* Fix CI error

* Remove code for naive_best_fit strategy
```
  431a2d6a
- F
  
  Add GetStream Interface for StreamSafeCUDAAllocator (#38195) · b0d12d99
  由 From00 提交于 12月 17, 2021
  
  b0d12d99
07 12月, 2021 1 次提交
- J
  
  add ipu device p1 (#37841) · c9a3c669
  由 jianghaicheng 提交于 12月 07, 2021
  
  c9a3c669
25 11月, 2021 1 次提交

Support multi-stream allocation for CUDA place (#37290) · b9c464c3

由 From00 提交于 11月 25, 2021

* Support multi-stream allocation for CUDA place

* Do not notify the retrying from other streams when free CUDA allocation

* Fix compile error for CPU

* Fix compile error for HIP

* Release memory for StreamSafeCUDAAllocaRetry in malloc_test

* Add FLAGS_use_stream_safe_cuda_allocator

* Fix CI error for 'set_tests_properties'

* Invalidate stream safe CUDA allocator for naive_best_fit and thread_local strategy

* Performance improvement: insert allocation pair to outstanding_events_map when free but not alloc; replace recursive_mutex with SpinLock

* FLAGS priority changes: FLAGS_use_system_allocator > FLAGS_use_stream_safe_cuda_allocator

* Performance improvement: directly delete allocation when the recorded_streams is empty in FreeImpl of StreamSafeCUDAAllocator

* Add UT for alloc interface

* Changes multi-stream interface; move retry code from AllocatorFacadePrivate to StreamSafeCUDAAllocator

b9c464c3

08 11月, 2021 1 次提交

Use cuda virtual memory management and merge blocks (#36189) · a1ec1d5a

由 wanghuancoder 提交于 11月 08, 2021

* Use cuda virtual memory management and merge blocks, test=develop

* refine, test=develop

* refine, test=develop

* refine, test=develop

* refine, test=develop

* refine, test=develop

* refine, test=develop

* refine, test=develop

* window dll, test=develop

* fix cuda error of CUDA_ERROR_NOT_INITIALIZED, test=develop

* use autogrowthv2 for system allocator, test=develop

* remove ~CUDAVirtualMemAllocator(), test=develop

* refine, test=develop

* fix cuda error of CUDA_ERROR_NOT_INITIALIZED, test=develop

* fix cuda error of CUDA_ERROR_NOT_INITIALIZED, test=develop

* fix bug, test=develop

* revert system allocator, test =develop

* revert multiprocessing, test=develop

* fix AutoGrowthBestFitAllocatorV2 mutxt, test=develop

* catch cudaErrorInitializationError when create allocator, test=develop

* fix cuMemSetAccess use, test=develop

* refine cuda api use, test=develop

* refine, test=develop

* for test, test=develop

* for test, test=develop

* switch to v2, test=develop

* refine virtual allocator, test=develop

* Record cuMemCreate and cuMemRelease, test=develop

* refine, test=develop

* avoid out of bounds, test=develop

* rename allocator, test=develop

* refine, test=develop

* use PADDLE_ENFORCE_CUDA_SUCCESS, test=develop

* for test,test=develop

* refine, test=develop

* refine, test=develop

* refine, test=develop

* refine, test=develop

* refine, test=develop

* refine, test=develop

a1ec1d5a

29 9月, 2021 1 次提交

Add basic support for CUDA Graph (#36190) · 21b93c3d

由 Zeng Jinle 提交于 9月 29, 2021

* add basic support for CUDA Graph

* fix ci compile error

* fix LOG print, fix windows CI

* follow comments and update

* small fix for default ctor

* fix rocm compile error

* fix CPU compile error

21b93c3d

17 9月, 2021 1 次提交

Make flag adding easier (#35823) · 2c781455

由 Zeng Jinle 提交于 9月 17, 2021

* make flag setter easier

* update

* rename macro name

* fix bug of public/writable

* update to pass CI

* polish

* fix CPU link error

2c781455

12 5月, 2021 1 次提交
- L
  
  [NPU] Support npu pinned allocator and manage Tensor on NPUPinnedPlace (#32840) · 6b3bb796
  由 liym27 提交于 5月 12, 2021
  
  6b3bb796
09 4月, 2021 1 次提交

[NPU] cherry-pick basic NPU components/allocator/operator/executor supports from ascendrc (#32144) · ccf5709d

由 Leo Chen 提交于 4月 09, 2021

* [feature] support npu allocator (#30840)

[feature] support npu allocator

* [feature] support npu operator (#30951)

[feature] support npu operator

* [feature] support npu allocator, part 2 (#30972)

* support npu allocator

* add npu device context

* fix some compile problem

* fix some compile problem

* add npu info

* compile ok

* fix include dir

* support naive_best_fit_allocator

* run ut ok, bug failed to exit

* call aclrtResetDevice before exit

* fix aclFinilize

* add system allocatot test

* add selected_gpus in gtest

* add tensor_test for npu

* support npu op, initial commit

* add npu stream

* add elementwise_add_op

* compile ok

* fix typo

* fix elementwise_add_op_npu_test

* support op run

* test can run but failed

* change aclopExecuteV2 to aclopCompileAndExecute

* support parsing ascend rank table file (#31000)

support parsing ascend rank table file

* Fix reshape on GE graph. (#31084)

Fix reshape on GE graph

* add npu kernel for elementwise_sub and elementwise_sub_grad (#30973)

* add npu sub op

* fix typo

* rename test

* fix bug

* fix bug

* add fp16 kernel

* fix typo

* support sub grad op

* support elementwise_sub_grad op
Co-authored-by: Nfrankwhzhang <frankwhzhang@126.com>

* Fix compilation problem (#31100)

Fix compilation problem (#31100)

* fix compile

* fix code stype

* remove const_cast

* support adding correct npu op in pybind.h (#31143)

* support adding correct npu op in pybind.h

* refine code

* [NPU] Support executor with NPU (#31057)

* [NPU] Support executor with NPU

* Fix code according to reviews

* Fix code

* Add unittest for sub op npu

* refactor npu device manager (#31154)

refactor npu device manager (#31154)

* fix selected npus

* fix compile

* fix reading flags from env

* format
Co-authored-by: Nxiayanming <41795079@qq.com>
Co-authored-by: Ngongweibao <weibao.gong@gmail.com>
Co-authored-by: Nfrankwhzhang <frankwhzhang@126.com>
Co-authored-by: Nliym27 <33742067+liym27@users.noreply.github.com>

ccf5709d

07 4月, 2021 1 次提交

【NPU】Merge ascend GE&distributed code by 0208 from ascendrc (#31957) · 8c7c53b3

由 zhang wenhui 提交于 4月 07, 2021

* Ascend rc (#30483)

* Fix compilcation on CANN20.1 and older (#30494)

Fix compilcation on CANN20.1 and older

* Add distribution supported (#30578)

Add distribution supported

* Build praser for Hcom* operators (#30627)

Build praser for Hcom* operators

* Pass device_ids info from launch to trainer. (#30632)

Pass device_ids info from launch to trainer

* Add Hccl program group (#30642)

Add Hccl program group

* Add startup bash files of test_ascend_group. (#30645)

Add startup bash files of test_ascend_group

* cleanup (#30646)

cleanup test_ascend_group.py

* [Feature] Build parser to support distributed training (#30658)

[Feature] Build parser to support distributed training

* fix compilation on ascend-20.1 (#30722)

fix compilation on ascend-20.1

* Dev/fix ascend string (#30749)

Dev/fix ascend string

* code style (#30781)

code style

* Merge ascend_optimizer and ascend_parser. (#30776)

Merge ascend_optimizer and ascend_parser.

* Ascendrc add converted op : [range/equal/range/uniform_random/expand/squeeze], fix cast op bug  (#30797)

Ascendrc add converted op : [range/equal/range/uniform_random/expand/squeeze], fix cast op bug

* Add paddle ascend distribution training supported (#30796)

Add paddle ascend distribution training supported

* pass cxx_flags to gloo cmake (#30857)

* Destroy session first. (#30954)

Destroy session first.

* merge

* fix, test=develop

* fix, test=develop

* fix style, test=develop

* fix, test=develop

* fix

* fix log fatal, test=develop

* fix enforce style, test=develop

* fix, test=develop

* fix, test=develop

* fix rccl, test=develop

* fix test, test=develop

* fix, test=develop

* fix, test=develop

* fix, test=develop

* fix node_num, test=develop

* fix ids str, test=develop

* fix ids str, test=develop

* fix ids str, test=develop

* fix, test=develop

* fix, test=develop

* fix, test=develop

* fix, test=develop

* fix, test=develop

* fix, test=develop

* fix, test=develop

* fix, test=develop

* fix style code, test=develop

* fix style code, test=develop

* fix style code, test=develop

* fix style code, test=develop
Co-authored-by: Nhutuxian <hutuxian2011@sina.cn>
Co-authored-by: Ngongweibao <weibao.gong@gmail.com>
Co-authored-by: NVoid Main <voidmain1313113@gmail.com>
Co-authored-by: NLeo Chen <chenqiuliang@baidu.com>
Co-authored-by: Ndingsiyu <18369187719@163.com>
Co-authored-by: NOleNet <olenet@126.com>

8c7c53b3

22 2月, 2021 1 次提交
- Q
  
  [ROCM] update fluid platform for rocm39 (part4), test=develop (#30936) · 33429630
  由 Qi Li 提交于 2月 22, 2021
  
  33429630
11 12月, 2020 1 次提交

Add the strategy of skipping cc/cu test compilation and execution in CI (#29499) · b5d4a1f3

由 LoveAn 提交于 12月 11, 2020

* Add the strategy of skipping cc/cu test compilation and execution in CI, test=develop

* fix if error with CI_SKIP_TEST, test=develop

* fix add properties to test error on Linux/MAC, test=develop

* fix set test properties of test_code_generator error, test=develop

* remove test codes and advance judgment of file modification on Linux, test=develop

* rename CI_SKIP_TEST to CI_SKIP_CPP_TEST, test=document_fix

* Add branch judgement on Linux, test=develop

b5d4a1f3

04 11月, 2020 1 次提交
- W
  
  [Inference] Memory modification for ShrinkMemory. (#28355) · 05114693
  由 Wilber 提交于 11月 04, 2020
  
  05114693
21 8月, 2020 1 次提交

support Baidu Kunlun AI Accelerator (#25959) · 138ecf24

由 QingshuChen 提交于 8月 21, 2020

* support Baidu AI Accelerator
  * test=kunlun

* minor
 * test=kunlun

* support xpu op in separate file
 * test=kunlun

* update XPU error message and remove duplicated code

 * test=kunlun

* minor
 * test=kunlun

* minor
 * test=kunlun

138ecf24

22 7月, 2020 1 次提交

fix best_fit_allocator_test on windows, test=develop (#25650) · 417b2439

由 Leo Chen 提交于 7月 22, 2020

* fix best_fit_allocator_test on windows, test=develop

* enable best_fit_allocator_test and test_math_op_patch_var_base, test=develop

417b2439

08 6月, 2020 1 次提交
- Z
  
  temporarily disable these unittests failed on windows (#24942) · 4058e736
  由 Zhou Wei 提交于 6月 08, 2020
  
  4058e736
21 4月, 2020 1 次提交

石

New feature: thread local allocator, test=develop (#23989) · d2584a70

由石晓伟提交于 4月 21, 2020

* add the thread_local_allocator, test=develop

* refactor the thread_local_allocator, test=develop

* provides option setting strategy, test=develop

d2584a70

02 3月, 2020 1 次提交

Speed up dygraph DataLoader based on shared memory and LoDTensor serialization (#22541) · 7d8d5734

由 Chen Weihang 提交于 3月 02, 2020

* add lodtensor share memory & serialization, test=develop

* fix windows compile error, test=develop

* deal vartype pickle & fix unittest matching error message, test=develop

* update timeout variable name, test=develop

* refactor memory map implement, test=develop

* clear mmap file discripter when exit unexpectedly, test=develop

* remove the child process fd in advance, test=develop

* remove mmap fds after Queue.put in child process, test=develop

* add hard unittests for register exit func, test=develop

* fix python2 compatibility problem in unittest, test=develop

* fix exception unittest error, test=develop

* polish code based review comment, test=develop

7d8d5734

19 12月, 2019 1 次提交
- Z
  Add some debug flags to auto growth allocator (#21766) · aa4d6a5d
  由 Zeng Jinle 提交于 12月 18, 2019
```
* add some debug flags to auto growth allocator, test=develop

* add comments about auto growth, test=develop
```
  aa4d6a5d
24 9月, 2019 1 次提交
- Z
  
  fix cuda dev_ctx allocator cmake deps, test=develop (#19953) · 37f76407
  由 Zeng Jinle 提交于 9月 24, 2019
  
  37f76407
11 9月, 2019 1 次提交

Replace TemporaryAllocator by CUDADeviceContextAllocator (#18989) · 12542320

由 Huihuang Zheng 提交于 9月 11, 2019

TemporaryAllocator is a singleton used for allocating memory for Cudnn. Since it is a singleton, we can delete it for better performance in memory.

We replace TemporaryAllocator by CUDADeviceContextAllocator and CUDADeviceContextAllocation, which uses stream callback to delete the memory allocated for the stream to avoid singleton.

Also added data_feed_proto to operator to fix CI in CPU compilation

12542320

03 9月, 2019 1 次提交
- Z
  
  fix retry_allocator_test by removing glog envs, test=develop (#19596) · e045aadf
  由 Zeng Jinle 提交于 9月 03, 2019
  
  e045aadf
01 9月, 2019 1 次提交

Add retry_allocator for gpu (#19409) · 0a73f720

由 Zeng Jinle 提交于 9月 01, 2019

* add retry_allocator for gpu, test=develop

* follow chengduoZH's comments, test=develop

* follow huihuang's comments,test=develop

* change f,l in enforce.h to be file,line, test=develop

* increase code coverage by adding unittests, test=develop

* fix CMakeLists.txt, test=develop

0a73f720

18 7月, 2019 1 次提交

Feature/auto_growth_allocator (#18561) · ae58afc5

由 Zeng Jinle 提交于 7月 18, 2019

* feature/auto_growth_allocator, test=develop

* add unittest of AlignedAllocator, test=develop

* try to turn on auto_growth to test on CI, test=develop

* fix segmentation fault in mixed_vector.h, test=develop

* add unittests, test=develop

ae58afc5

27 5月, 2019 1 次提交

Code clean of Allocator (#17602) · 4aa931dd

由 Zeng Jinle 提交于 5月 27, 2019

* Revert "Revert "Fix allocator bug""

This reverts commit 174d0d0b.

* Revert "fix travis ci"

This reverts commit 5656fa9f.

test=develop

* add inlined_vector.h, test=develop

* add inlined_vector_test,test=develop

* clean code of allocator,test=develop

* delete zero_size_allocator.h,test=develop

* fix failed unittest,test=develop

4aa931dd

23 5月, 2019 1 次提交

Fix allocator bug (#16712) · c6189637

由 Zeng Jinle 提交于 5月 23, 2019

* Revert "Revert "Fix allocator bug""

This reverts commit 174d0d0b.

* Revert "fix travis ci"

This reverts commit 5656fa9f.

test=develop

* add inlined_vector.h, test=develop

* add inlined_vector_test,test=develop

c6189637

07 5月, 2019 1 次提交
- Z
  fix retry_allocator (#17245) · 6fafd37e
  由 Zeng Jinle 提交于 5月 07, 2019
```
test=develop
```
  6fafd37e
28 3月, 2019 1 次提交
- Z
  Revert "Fix allocator bug" · 174d0d0b
  由 Zeng Jinle 提交于 3月 28, 2019
```
add include headers to fix travis-ci
test=develop
```
  174d0d0b
25 3月, 2019 1 次提交
- S
  split PR · c20db635
  由 sneaxiy 提交于 3月 25, 2019
```
test=develop
```
  c20db635
21 3月, 2019 1 次提交

add more unittest · 953214ad

由 sneaxiy 提交于 3月 19, 2019

modify allocator strategy
remove changes of legacy buddy_allocator
test=develop

953214ad

18 3月, 2019 1 次提交
- S
  add auto increment best fit allocator · e893cbd2
  由 sneaxiy 提交于 3月 18, 2019
```
test=develop
```
  e893cbd2
13 3月, 2019 1 次提交
- C
  Add memory profiler (#16137) · 09799566
  由 chengduo 提交于 3月 12, 2019
```
test=develop
```
  09799566
06 3月, 2019 1 次提交
- S
  add allocator chain to fix bug · 2a639d5c
  由 sneaxiy 提交于 3月 06, 2019
```
test=develop
```
  2a639d5c
16 11月, 2018 1 次提交
- Y
  Add legacy_allocator · 19e669a9
  由 Yu Yang 提交于 11月 16, 2018
```
test=develop
```
  19e669a9
14 11月, 2018 1 次提交
- Y
  Clean interface of allocator · ea81f8ee
  由 Yu Yang 提交于 11月 14, 2018
```
Clean managed/umnamaged allocator
```
  ea81f8ee
09 11月, 2018 1 次提交
- Y
  Add enum AllocatorStrategy · 1420c3b1
  由 Yu Yang 提交于 11月 09, 2018
```
test=develop
```
  1420c3b1

PaddlePaddle / Paddle 大约 1 年 前同步成功

PaddlePaddle / Paddle
大约 1 年前同步成功