提交 · 42910361d2997e60e0c5c14edd7418f556d97272 · 机器未来 / Paddle

08 2月, 2022 1 次提交

Support allocate CUDA managed memory (#39075) · 42910361

由 From00 提交于 2月 08, 2022

* Rough implementation for experiment

* Support allocate cuda managed memory

* Fix CI error

* Modify UT

* Check whether support memory oversubscription

* Fix ROCM Compile error

* Fix ROCM Compile error

* Fix UT cuda_managed_memory_test

* Set UT timeout to 40

* Add UT OOMExceptionTest

* Set UT timeout to 50

42910361

25 1月, 2022 1 次提交
- F
  
  Add GetBasePtr interface in paddle::memory (#39145) · b2a7261d
  由 From00 提交于 1月 25, 2022
  
  b2a7261d
08 12月, 2021 1 次提交

Fix CUDAGraphAllocator bug for StreamSafeCUDAAllocator (#37821) · b4a67491

由 From00 提交于 12月 08, 2021

* Fix CUDAGraph bug for StreamSafeCUDAAllocator

* Add CUDAGrapthAllocator check in multi-stream interface

* Set FLAGS_use_stream_safe_cuda_allocator defaulted to false

* Fix environment error for cmake

* Fix cmake error

* Add UT of GetAllocatorInterfaceTest

* Add UT of CUDAGraphExceptionTest

* Enhance CUDAGraphExceptionTest

b4a67491

25 11月, 2021 1 次提交

Support multi-stream allocation for CUDA place (#37290) · b9c464c3

由 From00 提交于 11月 25, 2021

* Support multi-stream allocation for CUDA place

* Do not notify the retrying from other streams when free CUDA allocation

* Fix compile error for CPU

* Fix compile error for HIP

* Release memory for StreamSafeCUDAAllocaRetry in malloc_test

* Add FLAGS_use_stream_safe_cuda_allocator

* Fix CI error for 'set_tests_properties'

* Invalidate stream safe CUDA allocator for naive_best_fit and thread_local strategy

* Performance improvement: insert allocation pair to outstanding_events_map when free but not alloc; replace recursive_mutex with SpinLock

* FLAGS priority changes: FLAGS_use_system_allocator > FLAGS_use_stream_safe_cuda_allocator

* Performance improvement: directly delete allocation when the recorded_streams is empty in FreeImpl of StreamSafeCUDAAllocator

* Add UT for alloc interface

* Changes multi-stream interface; move retry code from AllocatorFacadePrivate to StreamSafeCUDAAllocator

b9c464c3

01 2月, 2021 1 次提交
- Q
  
  [ROCM] update fluid memory for rocm35 (part1), test=develop (#30758) · 69875dc4
  由 Qi Li 提交于 2月 01, 2021
  
  69875dc4
14 1月, 2021 1 次提交
- Q
  
  fix bug that cann't find mkldnn(kunlun) (#30394) · cf786d22
  由 QingshuChen 提交于 1月 14, 2021
  
  cf786d22
14 1月, 2020 1 次提交
- Z
  faster build by reduce by-product, reduce linking library and fix compile... · 549e6de7
  由 zhouwei25 提交于 1月 14, 2020
```
faster build by reduce by-product, reduce linking library and fix compile warning of std=c++11 (#22164)
```
  549e6de7
24 9月, 2019 1 次提交
- Z
  
  fix cuda dev_ctx allocator cmake deps, test=develop (#19953) · 37f76407
  由 Zeng Jinle 提交于 9月 24, 2019
  
  37f76407
11 9月, 2019 1 次提交

Replace TemporaryAllocator by CUDADeviceContextAllocator (#18989) · 12542320

由 Huihuang Zheng 提交于 9月 11, 2019

TemporaryAllocator is a singleton used for allocating memory for Cudnn. Since it is a singleton, we can delete it for better performance in memory.

We replace TemporaryAllocator by CUDADeviceContextAllocator and CUDADeviceContextAllocation, which uses stream callback to delete the memory allocated for the stream to avoid singleton.

Also added data_feed_proto to operator to fix CI in CPU compilation

12542320

11 3月, 2019 1 次提交

Revert "Revert "Add Event for TensorCopy"" (#16035) · ad80bde8

由 chengduo 提交于 3月 11, 2019

* Revert "Revert "Add Event for TensorCopy" (#16022)"

This reverts commit e2da3a5b.

* use default stream
test=develop

ad80bde8

04 3月, 2019 3 次提交
- C
  Revert "Add Event for TensorCopy" (#16022) · 92438f61
  由 chengduo 提交于 3月 03, 2019
```
* Revert "Add Event for TensorCopy (#15953)"

This reverts commit 7235fd66.
test=develop

* fix CI
test=develop
```
  92438f61
- C
  Add Event for TensorCopy (#15953) · 06f3c857
  由 chengduo 提交于 3月 01, 2019
```
Add Event for TensorCopy 
```
  06f3c857
- C
  Revert "Add Event for TensorCopy" (#16022) · e2da3a5b
  由 chengduo 提交于 3月 03, 2019
```
* Revert "Add Event for TensorCopy (#15953)"

This reverts commit 7235fd66.
test=develop

* fix CI
test=develop
```
  e2da3a5b
01 3月, 2019 1 次提交
- C
  Add Event for TensorCopy (#15953) · 7235fd66
  由 chengduo 提交于 3月 01, 2019
```
Add Event for TensorCopy 
```
  7235fd66
02 2月, 2019 2 次提交
- P
  fix dependency · 061299be
  由 peizhilin 提交于 2月 02, 2019
```
test=develop
```
  061299be
- P
  
  test=develop · db563ec2
  由 peizhilin 提交于 2月 02, 2019
  
  db563ec2
16 11月, 2018 1 次提交
- Y
  Add legacy_allocator · 19e669a9
  由 Yu Yang 提交于 11月 16, 2018
```
test=develop
```
  19e669a9
10 10月, 2018 1 次提交
- S
  
  add support to old allocator · e2780623
  由 sneaxiy 提交于 10月 10, 2018
  
  e2780623
28 9月, 2018 1 次提交
- Y
  refactor(memory): rewrite memory allocation and make it extentable · 58ed412f
  由 Yu Yang 提交于 9月 28, 2018
```
Use OO style to rewrite memory allocation.
```
  58ed412f
08 4月, 2018 3 次提交
- Y
  
  Update paddle_memory in CMakeLists.txt files · 45bc4538
  由 Yi Wang 提交于 4月 07, 2018
  
  45bc4538
- Y
  
  Update CMakeLists · 67ba884d
  由 Yi Wang 提交于 4月 07, 2018
  
  67ba884d
- Y
  
  Rewrite the interface of memroy/detail · 402a9f1f
  由 Yi Wang 提交于 4月 07, 2018
  
  402a9f1f
03 4月, 2018 2 次提交
- C
  
  follow comments · 51c22fe4
  由 chengduoZH 提交于 4月 03, 2018
  
  51c22fe4
- C
  
  follow comments · 766c7405
  由 chengduoZH 提交于 4月 03, 2018
  
  766c7405
30 3月, 2018 1 次提交
- C
  
  compare the performance of unpinned memory and pinned memory · ffa63974
  由 chengduoZH 提交于 3月 29, 2018
  
  ffa63974
10 2月, 2018 1 次提交
- Y
  
  Move file to fluid/; Edit CMakeLists.txt · 90648f33
  由 Yi Wang 提交于 2月 09, 2018
  
  90648f33
06 2月, 2018 1 次提交
- L
  
  add independent inference_lib.cmake · 59e4dd57
  由 Luo Tao 提交于 2月 06, 2018
  
  59e4dd57
30 1月, 2018 1 次提交
- L
  
  make inference_lib_dist · 9b5d41b6
  由 Luo Tao 提交于 1月 30, 2018
  
  9b5d41b6
23 1月, 2018 2 次提交
- D
  
  Fix the cmake dependence. · a2b560d2
  由 dangqingqing 提交于 1月 23, 2018
  
  a2b560d2
- D
  
  Fix the dependence. · 608ebece
  由 dangqingqing 提交于 1月 23, 2018
  
  608ebece
16 1月, 2018 1 次提交
- L
  
  add paddle INSTALL for fluid api · 2be7cf90
  由 Luo Tao 提交于 1月 16, 2018
  
  2be7cf90
24 11月, 2017 1 次提交

Make enforce target (#5889) · c9172c1c

由 Qiao Longfei 提交于 11月 24, 2017

* make enforce a target and dependent on nccl when gpu is enabled

* add some more dependency

c9172c1c

28 10月, 2017 1 次提交
- Y
  Add debug logs in scope, meta_cache and memory (#5170) · 2a5edec0
  由 Yu Yang 提交于 10月 27, 2017
```
* Add debug logs in scope, meta_cache and memory

* Add missing deps
```
  2a5edec0
15 8月, 2017 1 次提交
- Q
  
  fix gpu build error · f168843e
  由 qijun 提交于 8月 15, 2017
  
  f168843e
04 8月, 2017 2 次提交
- L
  
  remove duplicate cpplint · 051fe172
  由 liaogang 提交于 8月 04, 2017
  
  051fe172
- L
  
  Add cpplint for *.h and cuda *.cu · b58725bd
  由 liaogang 提交于 8月 04, 2017
  
  b58725bd
29 7月, 2017 1 次提交
- H
  
  Fix build · cffd1ae4
  由 Helin Wang 提交于 7月 28, 2017
  
  cffd1ae4
27 7月, 2017 1 次提交
- L
  
  Remove GPUPlaceGuard · b4ff2e43
  由 liaogang 提交于 7月 27, 2017
  
  b4ff2e43
23 7月, 2017 2 次提交
- Y
  
  Add dependency memory->device_context, because we now use platform::GPUPlaceGuard · de6f9c48
  由 Yi Wang 提交于 7月 22, 2017
  
  de6f9c48
- Y
  
  Add dependency memory->device_context, because we now use platform::GPUPlaceGuard · f81caa4e
  由 Yi Wang 提交于 7月 22, 2017
  
  f81caa4e

机器未来 / Paddle 与 Fork 源项目一致

机器未来 / Paddle
与 Fork 源项目一致