- 01 4月, 2022 1 次提交
-
-
由 From00 提交于
* Fix compilation error for gcc-54 * Remove const for gpuStream_t
-
- 27 3月, 2022 1 次提交
-
-
由 From00 提交于
* Make StreamSafeCUDAAllocator compatible with NaiveBestFit strategy * Set FLAGS_use_stream_safe_cuda_allocator to false * Update * Remove unnecessary code * Fix CI errors * Add UT
-
- 03 3月, 2022 1 次提交
-
-
由 From00 提交于
* Support cuda graph in StreamSafeCudaAllocator * Fix CI error * Arrange AllocatorFacade * Fix CI error * Fix CI error * Fix ROCM Compile error * Fix ROCM Compile error
-
- 20 2月, 2022 1 次提交
-
-
由 Chen Weihang 提交于
* rename pten dir to phi * rename namespace to phi * rename infrt pten dir to phi * resolve conflict * rename pten to phi in cmake * revert all infrt change * change needed files * fix infrt failed * fix inference failed
-
- 09 2月, 2022 1 次提交
-
-
由 Chen Weihang 提交于
-
- 25 1月, 2022 1 次提交
-
-
由 From00 提交于
-
- 28 12月, 2021 1 次提交
-
-
由 From00 提交于
* fix reshape move storage error * remove needless set type * alloc tensor by shared storage * Utilize StreamSafeCUDAAllocator to support fast GC in new executor * Fix compile error for Windows and ROCm * Fix compile error for Windows * Modify UT stream_safe_cuda_alloc_test * Modify UT stream_safe_cuda_alloc_test * Rewrite fast GC * Rewrite fast GC * Fix compile error for BOOST_GET_CONST * Fix compile error for BOOST_GET_CONST * Changes default stream for StreamSafeCUDAAllocator * Fix a small CI error * Remove some redundant code * Fix conflict * Fix compile error for ROCm * Fix Windoes CI error * Fix CI error * Remove some unnecessary code * Fix CI error * Add UT for fast GC * Fix CI error * add device-agnostic stream class * add stream.h * fix ut * fix cpu compile * Use RWLock in GetAllocator * Fix CI error Co-authored-by: NChen Weihang <chenweihang@baidu.com> Co-authored-by: Nzhiqiu <chenqiuliang@baidu.com>
-
- 27 12月, 2021 1 次提交
-
-
由 Leo Chen 提交于
* add device-agnostic stream class * add stream.h * fix ut * fix cpu compile
-
- 17 12月, 2021 1 次提交
-
-
由 From00 提交于
-
- 25 11月, 2021 1 次提交
-
-
由 From00 提交于
* Support multi-stream allocation for CUDA place * Do not notify the retrying from other streams when free CUDA allocation * Fix compile error for CPU * Fix compile error for HIP * Release memory for StreamSafeCUDAAllocaRetry in malloc_test * Add FLAGS_use_stream_safe_cuda_allocator * Fix CI error for 'set_tests_properties' * Invalidate stream safe CUDA allocator for naive_best_fit and thread_local strategy * Performance improvement: insert allocation pair to outstanding_events_map when free but not alloc; replace recursive_mutex with SpinLock * FLAGS priority changes: FLAGS_use_system_allocator > FLAGS_use_stream_safe_cuda_allocator * Performance improvement: directly delete allocation when the recorded_streams is empty in FreeImpl of StreamSafeCUDAAllocator * Add UT for alloc interface * Changes multi-stream interface; move retry code from AllocatorFacadePrivate to StreamSafeCUDAAllocator
-
- 04 2月, 2021 1 次提交
-
-
由 wanghuancoder 提交于
* use iwyu clean include second time, test=develop
-
- 06 11月, 2020 1 次提交
-
-
由 Wilber 提交于
-
- 04 11月, 2020 1 次提交
-
-
由 Wilber 提交于
-
- 24 9月, 2019 1 次提交
-
-
由 Zeng Jinle 提交于
-
- 11 9月, 2019 1 次提交
-
-
由 Huihuang Zheng 提交于
TemporaryAllocator is a singleton used for allocating memory for Cudnn. Since it is a singleton, we can delete it for better performance in memory. We replace TemporaryAllocator by CUDADeviceContextAllocator and CUDADeviceContextAllocation, which uses stream callback to delete the memory allocated for the stream to avoid singleton. Also added data_feed_proto to operator to fix CI in CPU compilation
-
- 10 6月, 2019 1 次提交
-
-
由 Zeng Jinle 提交于
* remove attribute in Allocator::Allocate, test=develop * fix travis ci error, test=develop
-
- 16 11月, 2018 1 次提交
-
-
由 Yu Yang 提交于
test=develop
-
- 14 11月, 2018 2 次提交
- 09 11月, 2018 2 次提交
- 08 11月, 2018 1 次提交
-
-
由 minqiyang 提交于
Fix code to support cpplint syntax check test=develop
-
- 19 10月, 2018 1 次提交
-
-
由 sneaxiy 提交于
-
- 10 10月, 2018 1 次提交
-
-
由 sneaxiy 提交于
-
- 28 9月, 2018 2 次提交
-
-
由 Yan Chunwei 提交于
- add naive executor - fix concurrency performance issue
-
由 Yu Yang 提交于
Use OO style to rewrite memory allocation.
-
- 08 8月, 2018 1 次提交
-
-
由 Dun Liang 提交于
-
- 09 7月, 2018 1 次提交
-
-
由 gongweibao 提交于
-
- 29 6月, 2018 1 次提交
-
-
由 chengduo 提交于
* memory init * add env * refine anounce * Add check for Nan * Debug * Add env for cc_test * Add env for py_test and nv_test * Remove py_test env * Add env for py_test * serial test_recognize_digits * Test FLAGS_init_allocated_mem function for unit test * Init allocated mem for op unit test * Add env for all unit test
-
- 08 4月, 2018 3 次提交
- 02 4月, 2018 1 次提交
-
-
由 chengduoZH 提交于
-
- 28 3月, 2018 1 次提交
-
-
由 chengduoZH 提交于
-
- 27 3月, 2018 1 次提交
-
-
由 chengduoZH 提交于
-
- 26 3月, 2018 3 次提交
-
-
由 chengduoZH 提交于
-
由 chengduoZH 提交于
-
由 chengduoZH 提交于
-
- 20 3月, 2018 2 次提交