- 31 12月, 2021 1 次提交
-
-
由 fwenguang 提交于
* [MLU]support calling mlu op from python interface * [MLU]fix * fix * [mlu]fix mlu_places * [mlu]fix required mlu * fix * [MLU]fix tensor copy * [mlu] fix MLUPlace call path
-
- 30 12月, 2021 1 次提交
-
-
由 From00 提交于
-
- 29 12月, 2021 1 次提交
-
-
由 Huihuang Zheng 提交于
Fix Buddy Allocator random CI failure due to machine environment.
-
- 28 12月, 2021 1 次提交
-
-
由 From00 提交于
* fix reshape move storage error * remove needless set type * alloc tensor by shared storage * Utilize StreamSafeCUDAAllocator to support fast GC in new executor * Fix compile error for Windows and ROCm * Fix compile error for Windows * Modify UT stream_safe_cuda_alloc_test * Modify UT stream_safe_cuda_alloc_test * Rewrite fast GC * Rewrite fast GC * Fix compile error for BOOST_GET_CONST * Fix compile error for BOOST_GET_CONST * Changes default stream for StreamSafeCUDAAllocator * Fix a small CI error * Remove some redundant code * Fix conflict * Fix compile error for ROCm * Fix Windoes CI error * Fix CI error * Remove some unnecessary code * Fix CI error * Add UT for fast GC * Fix CI error * add device-agnostic stream class * add stream.h * fix ut * fix cpu compile * Use RWLock in GetAllocator * Fix CI error Co-authored-by: NChen Weihang <chenweihang@baidu.com> Co-authored-by: Nzhiqiu <chenqiuliang@baidu.com>
-
- 27 12月, 2021 1 次提交
-
-
由 Leo Chen 提交于
* add device-agnostic stream class * add stream.h * fix ut * fix cpu compile
-
- 22 12月, 2021 1 次提交
-
-
由 Yang 提交于
-
- 20 12月, 2021 2 次提交
- 17 12月, 2021 3 次提交
-
-
由 Leo Chen 提交于
-
由 From00 提交于
* Get GPU BasePtr from CUDA allocation * Fix compile error for ROCm * Add BasePtr function for IPUPlace in naive_best_fit_allocator.cc * Add alignment for BuddyAllocator * Set address alignment of BuddyAllocator to 32 bytes * Fix CI error * Remove code for naive_best_fit strategy
-
由 From00 提交于
-
- 13 12月, 2021 1 次提交
-
-
由 taixiurong 提交于
-
- 10 12月, 2021 1 次提交
-
-
由 sneaxiy 提交于
-
- 09 12月, 2021 1 次提交
-
-
由 jianghaicheng 提交于
-
- 08 12月, 2021 1 次提交
-
-
由 From00 提交于
* Fix CUDAGraph bug for StreamSafeCUDAAllocator * Add CUDAGrapthAllocator check in multi-stream interface * Set FLAGS_use_stream_safe_cuda_allocator defaulted to false * Fix environment error for cmake * Fix cmake error * Add UT of GetAllocatorInterfaceTest * Add UT of CUDAGraphExceptionTest * Enhance CUDAGraphExceptionTest
-
- 07 12月, 2021 1 次提交
-
-
由 jianghaicheng 提交于
-
- 03 12月, 2021 1 次提交
-
-
由 ronnywang 提交于
* refine structure for cuda and rocm * update * update * update * update
-
- 01 12月, 2021 1 次提交
-
-
由 Leo Chen 提交于
-
- 29 11月, 2021 1 次提交
-
-
由 piotrekobiIntel 提交于
-
- 27 11月, 2021 1 次提交
-
-
由 Aganlengzi 提交于
* [NPU] reorganization for device API abstraction * [NPU] delete old files * [NPU] fix npu_collective_helper * [NPU] fix collective_helper * [NPU] fix ut * [NPU] mod memory allocation and hccl_helper * [NPU] fix place_type * [NPU] split enfoce.h * move acl* call into npu_info * merge conflict * fix merge * merge conflict * merge conflict
-
- 25 11月, 2021 1 次提交
-
-
由 From00 提交于
* Support multi-stream allocation for CUDA place * Do not notify the retrying from other streams when free CUDA allocation * Fix compile error for CPU * Fix compile error for HIP * Release memory for StreamSafeCUDAAllocaRetry in malloc_test * Add FLAGS_use_stream_safe_cuda_allocator * Fix CI error for 'set_tests_properties' * Invalidate stream safe CUDA allocator for naive_best_fit and thread_local strategy * Performance improvement: insert allocation pair to outstanding_events_map when free but not alloc; replace recursive_mutex with SpinLock * FLAGS priority changes: FLAGS_use_system_allocator > FLAGS_use_stream_safe_cuda_allocator * Performance improvement: directly delete allocation when the recorded_streams is empty in FreeImpl of StreamSafeCUDAAllocator * Add UT for alloc interface * Changes multi-stream interface; move retry code from AllocatorFacadePrivate to StreamSafeCUDAAllocator
-
- 23 11月, 2021 1 次提交
-
-
由 Qi Li 提交于
* [XPU] Reorganize xpu device codes in platform, test=develop * fix xpu_header.h, test=develop
-
- 22 11月, 2021 1 次提交
-
-
由 wanghuancoder 提交于
-
- 17 11月, 2021 1 次提交
-
-
由 WangXi 提交于
-
- 08 11月, 2021 1 次提交
-
-
由 wanghuancoder 提交于
* Use cuda virtual memory management and merge blocks, test=develop * refine, test=develop * refine, test=develop * refine, test=develop * refine, test=develop * refine, test=develop * refine, test=develop * refine, test=develop * window dll, test=develop * fix cuda error of CUDA_ERROR_NOT_INITIALIZED, test=develop * use autogrowthv2 for system allocator, test=develop * remove ~CUDAVirtualMemAllocator(), test=develop * refine, test=develop * fix cuda error of CUDA_ERROR_NOT_INITIALIZED, test=develop * fix cuda error of CUDA_ERROR_NOT_INITIALIZED, test=develop * fix bug, test=develop * revert system allocator, test =develop * revert multiprocessing, test=develop * fix AutoGrowthBestFitAllocatorV2 mutxt, test=develop * catch cudaErrorInitializationError when create allocator, test=develop * fix cuMemSetAccess use, test=develop * refine cuda api use, test=develop * refine, test=develop * for test, test=develop * for test, test=develop * switch to v2, test=develop * refine virtual allocator, test=develop * Record cuMemCreate and cuMemRelease, test=develop * refine, test=develop * avoid out of bounds, test=develop * rename allocator, test=develop * refine, test=develop * use PADDLE_ENFORCE_CUDA_SUCCESS, test=develop * for test,test=develop * refine, test=develop * refine, test=develop * refine, test=develop * refine, test=develop * refine, test=develop * refine, test=develop
-
- 01 11月, 2021 1 次提交
-
-
由 Leo Chen 提交于
* refine vlog of interpretercore * fix ut
-
- 11 10月, 2021 1 次提交
-
-
由 Leo Chen 提交于
* do not use alignedAllocator when cuda has alignment * update test * fix error during multiple process
-
- 29 9月, 2021 2 次提交
-
-
由 Zeng Jinle 提交于
* add basic support for CUDA Graph * fix ci compile error * fix LOG print, fix windows CI * follow comments and update * small fix for default ctor * fix rocm compile error * fix CPU compile error
-
由 liutiexing 提交于
* add align for WorkQueue * add spinlock * merge spinlock
-
- 22 9月, 2021 1 次提交
-
-
由 Tomasz Socha 提交于
* Fix copy elision warning * Remove redundand code
-
- 17 9月, 2021 1 次提交
-
-
由 Zeng Jinle 提交于
* make flag setter easier * update * rename macro name * fix bug of public/writable * update to pass CI * polish * fix CPU link error
-
- 11 9月, 2021 1 次提交
-
-
由 wanghuancoder 提交于
* refactor gc, test=develop * refine, test=develop * refine, test=develop * refine, test=develop * gc each tensor, test=develop * refine, test=develop
-
- 03 9月, 2021 1 次提交
-
-
由 Leo Chen 提交于
-
- 26 8月, 2021 1 次提交
-
-
由 wanghuancoder 提交于
* use spinlock in auto growth, test=develop * refine,test=develop
-
- 23 8月, 2021 1 次提交
-
-
由 wanghuancoder 提交于
This reverts commit 6bacfb0e.
-
- 20 8月, 2021 1 次提交
-
-
由 wanghuancoder 提交于
* use spin lock in auto growth allocator, test=develop * use pthread spin lock, test=develop * use lock guard, test=develop * use malloc spin lock, test=develop * use lock_guard, test=develop
-
- 09 8月, 2021 1 次提交
-
-
由 Leo Chen 提交于
* add lock * fix typo
-
- 03 8月, 2021 1 次提交
-
-
由 QingshuChen 提交于
* support Kunlun2 * support KL2 * support KL2
-
- 19 7月, 2021 1 次提交
-
-
由 Qi Li 提交于
-
- 28 6月, 2021 1 次提交
-
-
由 Qi Li 提交于
* [ROCM] fix RNN miopen as weight need to permuted, test=develop * [ROCM] fix data share when is_test, test=develop * update, test=develop
-