提交 · feaa97984592e08af313acc9d09c7e07e2fc0499 · BaiXuePrincess / Paddle

01 4月, 2022 1 次提交
- F
  Fix compilation errors for gcc-54 (#41228) · 8aef685b
  由 From00 提交于 4月 01, 2022
```
* Fix compilation error for gcc-54

* Remove const for gpuStream_t
```
  8aef685b
30 3月, 2022 1 次提交

Add new APIs for GPU memory monitoring (max_memory_allocated,... · afe02e9d

由 From00 提交于 3月 30, 2022

Add new APIs for GPU memory monitoring (max_memory_allocated, max_memory_reserved, memory_allocated, memory_reserved) (#38657)

* Add new API memory_reserved

* Add memory_allocated, max_memory_reserved and max_memory_allocater

* Fix CI error

* Fix CI error

* Enhance UT

* Add FLAGS_memory_stats_opt

* Add STATS macro functions

* Add StatAllocator

* Fix CI errors

* Add UT

* Fix CI errors

afe02e9d

27 3月, 2022 1 次提交

Make StreamSafeCUDAAllocator compatible with NaiveBestFit strategy (#40886) · 0ad2e192

由 From00 提交于 3月 27, 2022

* Make StreamSafeCUDAAllocator compatible with NaiveBestFit strategy

* Set FLAGS_use_stream_safe_cuda_allocator to false

* Update

* Remove unnecessary code

* Fix CI errors

* Add UT

0ad2e192

25 3月, 2022 1 次提交

support multi_dims for tril_triu, *test=kunlun (#40712) · 9ffedcfd

由 z8hanghuan 提交于 3月 25, 2022

* support multi_dims for tril_triu, *test=kunlun

* support multi_dims for tril_triu, *test=kunlun

* support multi_dims for tril_triu, *test=kunlun

* update xpu.cmake date, support multi_dims for tril_triu, *test=kunlun

9ffedcfd

23 3月, 2022 1 次提交

Performance optimization for StreamSafeCudaAllocator (#40718) · d8bff988

由 From00 提交于 3月 23, 2022

* Performance optimize

* Optimize GetAllocator, RWLock and ProcessUnfreedAllocation

* Remove test file

* Fix CI error

* Fix CI errors

* Fix CI errors

d8bff988

18 3月, 2022 1 次提交
- A
  
  [NPU] fix no allocator error (#40687) · 8c713223
  由 Aganlengzi 提交于 3月 18, 2022
  
  8c713223
14 3月, 2022 1 次提交

[multiprocessing] Add paddle.incubate.multiprocessing for sharing tensors ... · e553f758

由 Zhong Hui 提交于 3月 14, 2022

[multiprocessing] Add paddle.incubate.multiprocessing for sharing tensors  between python processes. (#37302)

* Add support for paddle.multiprocessing
* move multiprocessing to incubate.

e553f758

03 3月, 2022 2 次提交
- F
  Support cuda graph in StreamSafeCudaAllocator (#39594) · 4c0511fa
  由 From00 提交于 3月 03, 2022
```
* Support cuda graph in StreamSafeCudaAllocator

* Fix CI error

* Arrange AllocatorFacade

* Fix CI error

* Fix CI error

* Fix ROCM Compile error

* Fix ROCM Compile error
```
  4c0511fa
- R
  
  [CustomRuntime] migrate CustomRuntime into phi (#39908) · b4665d23
  由 ronnywang 提交于 3月 03, 2022
  
  b4665d23
28 2月, 2022 1 次提交

Profile Executor (#39641) · 7ecefec3

由 liutiexing 提交于 2月 28, 2022

* add align for WorkQueue

* add spinlock

* merge develop

* merge

* Add EventsWaiter

* Revert "Add EventsWaiter"

This reverts commit e206173aa9be7401b83a53581627bfaf557c8fb2.

* add log for Executor

* Profile Allocators

* Profile Allocators

* adjust interface

* remove lock for set

* fix
Co-authored-by: Nliutiexing <liutiexing@google.com>

7ecefec3

25 2月, 2022 1 次提交
- Q
  [ROCm] fix Managed Memory Alloc on HIP, test=develop (#39896) · 37cb6f32
  由 Qi Li 提交于 2月 25, 2022
```
* [ROCm] fix Managed Memory Alloc on HIP, test=develop

* update, test=develop
```
  37cb6f32
20 2月, 2022 1 次提交

[PTen->Phi PR1] Change pten dirname and namespace to phi (#39748) · dcfe1986

由 Chen Weihang 提交于 2月 20, 2022

* rename pten dir to phi

* rename namespace to phi

* rename infrt pten dir to phi

* resolve conflict

* rename pten to phi in cmake

* revert all infrt change

* change needed files

* fix infrt failed

* fix inference failed

dcfe1986

15 2月, 2022 1 次提交

[PluggableDevice] Add custom runtime support (#38740) · 3e7825f3

由 ronnywang 提交于 2月 15, 2022

* [CustomRuntime] Add DeviceManager

* [CustomRuntime] Add DeviceInterface

* [CustomRuntime] Add Stream, Event, DeviceGuard, CallbackManager

* [CustomRuntime] Add plug-in device

* [CustomRuntime] Memory module support PluggableDevice

* [CustomRuntime] Add WITH_PLUGGABLE_DEVICE cmake option

* update

* [API] update API doc based on comments, test=develop
Co-authored-by: Nqili93 <qili93@qq.com>

3e7825f3

09 2月, 2022 1 次提交
- C
  
  move stream into pten (#39392) · 266955a9
  由 Chen Weihang 提交于 2月 09, 2022
  
  266955a9
08 2月, 2022 1 次提交

Support allocate CUDA managed memory (#39075) · 42910361

由 From00 提交于 2月 08, 2022

* Rough implementation for experiment

* Support allocate cuda managed memory

* Fix CI error

* Modify UT

* Check whether support memory oversubscription

* Fix ROCM Compile error

* Fix ROCM Compile error

* Fix UT cuda_managed_memory_test

* Set UT timeout to 40

* Add UT OOMExceptionTest

* Set UT timeout to 50

42910361

06 2月, 2022 1 次提交
- W
  
  [PTEN] Add Gpu context (#39305) · a821c4a9
  由 Wilber 提交于 2月 06, 2022
  
  a821c4a9
27 1月, 2022 1 次提交

[PTen]Support AllocateFrom in Tensor and Alloc/HostAlloc in Context (#39022) · 5631da9c

由 Aurelius84 提交于 1月 27, 2022

* Support allocate_from in Tensor and allocate_data in Context

* fix #ifdef CUDA

* fix cycle depends

* fix test_xxx_dev_api failed

* fix windows compiling error

* fix unittest

* modify into PImpl

* fix selected rows

* add TODO comment

* refine interface according reviewer

5631da9c

25 1月, 2022 1 次提交
- F
  
  Add GetBasePtr interface in paddle::memory (#39145) · b2a7261d
  由 From00 提交于 1月 25, 2022
  
  b2a7261d
17 1月, 2022 1 次提交

[Pten] Replace platform::Place to pten::Place. (#38899) · c48a9ad5

由 Wilber 提交于 1月 17, 2022

* add pten::Place data structure.

* update ci problem

* fix ci problem

* update

* using platform::Place=pten::Place

* remove BOOST_GET_CONST for CPUPlace and GPUPlace

* compile pass 25%.

* compile pass 45%

* compile pass 60%

* remove boost_get for xpu npu mlu and ipu

* compile pass on cpu and gpu.

* fix compile problem

* fix compile error.

* update

* fix ci problem

* update

* ci approve

* fix ci problem

* fix ci eager test problem

* remove BOOST_GET_CONST

* fix npu compile

c48a9ad5

13 1月, 2022 1 次提交
- 石
  
  splits allocation for pten, test=develop (#38853) · 277cf900
  由石晓伟提交于 1月 13, 2022
  
  277cf900
04 1月, 2022 1 次提交
- Q
  
  [XPU] update XPU device info, test=develop (#37884) · e1187e50
  由 Qi Li 提交于 1月 04, 2022
  
  e1187e50
31 12月, 2021 1 次提交

[MLU]support calling mlu op from python interface (#38292) · b6bf650a

由 fwenguang 提交于 12月 31, 2021

* [MLU]support calling mlu op from python interface

* [MLU]fix

* fix

* [mlu]fix mlu_places

* [mlu]fix required mlu

* fix

* [MLU]fix tensor copy

* [mlu] fix MLUPlace call path

b6bf650a

30 12月, 2021 1 次提交
- F
  
  Replace shared_ptr with unique_ptr in base_ptr_test (#38530) · 3f6229c6
  由 From00 提交于 12月 30, 2021
  
  3f6229c6
28 12月, 2021 1 次提交

Utilize StreamSafeCUDAAllocator to support fast GC in new executor (#37642) · 0c7153a4

由 From00 提交于 12月 28, 2021

* fix reshape move storage error

* remove needless set type

* alloc tensor by shared storage

* Utilize StreamSafeCUDAAllocator to support fast GC in new executor

* Fix compile error for Windows and ROCm

* Fix compile error for Windows

* Modify UT stream_safe_cuda_alloc_test

* Modify UT stream_safe_cuda_alloc_test

* Rewrite fast GC

* Rewrite fast GC

* Fix compile error for BOOST_GET_CONST

* Fix compile error for BOOST_GET_CONST

* Changes default stream for StreamSafeCUDAAllocator

* Fix a small CI error

* Remove some redundant code

* Fix conflict

* Fix compile error for ROCm

* Fix Windoes CI error

* Fix CI error

* Remove some unnecessary code

* Fix CI error

* Add UT for fast GC

* Fix CI error

* add device-agnostic stream class

* add stream.h

* fix ut

* fix cpu compile

* Use RWLock in GetAllocator

* Fix CI error
Co-authored-by: NChen Weihang <chenweihang@baidu.com>
Co-authored-by: Nzhiqiu <chenqiuliang@baidu.com>

0c7153a4

27 12月, 2021 1 次提交
- L
  add device-agnostic stream class (#38391) · 6b5e33b4
  由 Leo Chen 提交于 12月 27, 2021
```
* add device-agnostic stream class

* add stream.h

* fix ut

* fix cpu compile
```
  6b5e33b4
20 12月, 2021 2 次提交
- F
  
  [MLU]add mlu backend (#38207) · 76514a1f
  由 fwenguang 提交于 12月 20, 2021
  
  76514a1f
- F
  
  Skip zero-size Allocation in RecordStream (#38264) · 48937020
  由 From00 提交于 12月 20, 2021
  
  48937020
17 12月, 2021 3 次提交
- L
  
  fit CI_SKIP_CPP_TEST (#38242) · b613c31e
  由 Leo Chen 提交于 12月 17, 2021
  
  b613c31e
- F
  Get base pointer from Allocation (#37978) · 431a2d6a
  由 From00 提交于 12月 17, 2021
```
* Get GPU BasePtr from CUDA allocation

* Fix compile error for ROCm

* Add BasePtr function for IPUPlace in naive_best_fit_allocator.cc

* Add alignment for BuddyAllocator

* Set address alignment of BuddyAllocator to 32 bytes

* Fix CI error

* Remove code for naive_best_fit strategy
```
  431a2d6a
- F
  
  Add GetStream Interface for StreamSafeCUDAAllocator (#38195) · b0d12d99
  由 From00 提交于 12月 17, 2021
  
  b0d12d99
10 12月, 2021 1 次提交
- S
  
  make cuda graph thread local allocator (#37814) · 62b1f38c
  由 sneaxiy 提交于 12月 10, 2021
  
  62b1f38c
09 12月, 2021 1 次提交
- J
  
  add ipu device p2 (#37840) · cb636a48
  由 jianghaicheng 提交于 12月 09, 2021
  
  cb636a48
08 12月, 2021 1 次提交

Fix CUDAGraphAllocator bug for StreamSafeCUDAAllocator (#37821) · b4a67491

由 From00 提交于 12月 08, 2021

* Fix CUDAGraph bug for StreamSafeCUDAAllocator

* Add CUDAGrapthAllocator check in multi-stream interface

* Set FLAGS_use_stream_safe_cuda_allocator defaulted to false

* Fix environment error for cmake

* Fix cmake error

* Add UT of GetAllocatorInterfaceTest

* Add UT of CUDAGraphExceptionTest

* Enhance CUDAGraphExceptionTest

b4a67491

07 12月, 2021 1 次提交
- J
  
  add ipu device p1 (#37841) · c9a3c669
  由 jianghaicheng 提交于 12月 07, 2021
  
  c9a3c669
03 12月, 2021 1 次提交
- R
  refine structure for cuda and rocm (#37202) · a6d2fddb
  由 ronnywang 提交于 12月 03, 2021
```
* refine structure for cuda and rocm

* update

* update

* update

* update
```
  a6d2fddb
01 12月, 2021 1 次提交
- L
  
  add vlog to auto_growth_best_fit_allocator (#37601) · 934e5d09
  由 Leo Chen 提交于 12月 01, 2021
  
  934e5d09
27 11月, 2021 1 次提交

[NPU] reorganization for device API abstraction (#37110) · 72241a6a

由 Aganlengzi 提交于 11月 27, 2021

* [NPU] reorganization for device API abstraction

* [NPU] delete old files

* [NPU] fix npu_collective_helper

* [NPU] fix collective_helper

* [NPU] fix ut

* [NPU] mod memory allocation and hccl_helper

* [NPU] fix place_type

* [NPU] split enfoce.h

* move acl* call into npu_info

* merge conflict

* fix merge

* merge conflict

* merge conflict

72241a6a

25 11月, 2021 1 次提交

Support multi-stream allocation for CUDA place (#37290) · b9c464c3

由 From00 提交于 11月 25, 2021

* Support multi-stream allocation for CUDA place

* Do not notify the retrying from other streams when free CUDA allocation

* Fix compile error for CPU

* Fix compile error for HIP

* Release memory for StreamSafeCUDAAllocaRetry in malloc_test

* Add FLAGS_use_stream_safe_cuda_allocator

* Fix CI error for 'set_tests_properties'

* Invalidate stream safe CUDA allocator for naive_best_fit and thread_local strategy

* Performance improvement: insert allocation pair to outstanding_events_map when free but not alloc; replace recursive_mutex with SpinLock

* FLAGS priority changes: FLAGS_use_system_allocator > FLAGS_use_stream_safe_cuda_allocator

* Performance improvement: directly delete allocation when the recorded_streams is empty in FreeImpl of StreamSafeCUDAAllocator

* Add UT for alloc interface

* Changes multi-stream interface; move retry code from AllocatorFacadePrivate to StreamSafeCUDAAllocator

b9c464c3

23 11月, 2021 1 次提交
- Q
  [XPU] Reorganize xpu device codes in platform, test=develop (#37428) · 79800978
  由 Qi Li 提交于 11月 23, 2021
```
* [XPU] Reorganize xpu device codes in platform, test=develop

* fix xpu_header.h, test=develop
```
  79800978
22 11月, 2021 1 次提交
- W
  
  fix cuda_virtual_mem_allocator a bug, test=develop (#37390) · e28d5b89
  由 wanghuancoder 提交于 11月 22, 2021
  
  e28d5b89

BaiXuePrincess / Paddle 与 Fork 源项目一致

BaiXuePrincess / Paddle
与 Fork 源项目一致