- 26 6月, 2022 1 次提交
-
-
由 Sing_chan 提交于
-
- 05 6月, 2022 1 次提交
-
-
由 Sing_chan 提交于
-
- 27 3月, 2022 1 次提交
-
-
由 From00 提交于
* Make StreamSafeCUDAAllocator compatible with NaiveBestFit strategy * Set FLAGS_use_stream_safe_cuda_allocator to false * Update * Remove unnecessary code * Fix CI errors * Add UT
-
- 03 3月, 2022 1 次提交
-
-
由 From00 提交于
* Support cuda graph in StreamSafeCudaAllocator * Fix CI error * Arrange AllocatorFacade * Fix CI error * Fix CI error * Fix ROCM Compile error * Fix ROCM Compile error
-
- 20 2月, 2022 1 次提交
-
-
由 Chen Weihang 提交于
* rename pten dir to phi * rename namespace to phi * rename infrt pten dir to phi * resolve conflict * rename pten to phi in cmake * revert all infrt change * change needed files * fix infrt failed * fix inference failed
-
- 09 2月, 2022 1 次提交
-
-
由 Chen Weihang 提交于
-
- 05 1月, 2022 1 次提交
-
-
由 From00 提交于
* Fix bug of GetAllocatorInterfaceTest * Replace some shared_ptr with unique_ptr * Change Alloc call
-
- 27 12月, 2021 1 次提交
-
-
由 Leo Chen 提交于
* add device-agnostic stream class * add stream.h * fix ut * fix cpu compile
-
- 20 12月, 2021 1 次提交
-
-
由 From00 提交于
-
- 17 12月, 2021 1 次提交
-
-
由 From00 提交于
-
- 08 12月, 2021 1 次提交
-
-
由 From00 提交于
* Fix CUDAGraph bug for StreamSafeCUDAAllocator * Add CUDAGrapthAllocator check in multi-stream interface * Set FLAGS_use_stream_safe_cuda_allocator defaulted to false * Fix environment error for cmake * Fix cmake error * Add UT of GetAllocatorInterfaceTest * Add UT of CUDAGraphExceptionTest * Enhance CUDAGraphExceptionTest
-
- 03 12月, 2021 1 次提交
-
-
由 ronnywang 提交于
* refine structure for cuda and rocm * update * update * update * update
-
- 25 11月, 2021 1 次提交
-
-
由 From00 提交于
* Support multi-stream allocation for CUDA place * Do not notify the retrying from other streams when free CUDA allocation * Fix compile error for CPU * Fix compile error for HIP * Release memory for StreamSafeCUDAAllocaRetry in malloc_test * Add FLAGS_use_stream_safe_cuda_allocator * Fix CI error for 'set_tests_properties' * Invalidate stream safe CUDA allocator for naive_best_fit and thread_local strategy * Performance improvement: insert allocation pair to outstanding_events_map when free but not alloc; replace recursive_mutex with SpinLock * FLAGS priority changes: FLAGS_use_system_allocator > FLAGS_use_stream_safe_cuda_allocator * Performance improvement: directly delete allocation when the recorded_streams is empty in FreeImpl of StreamSafeCUDAAllocator * Add UT for alloc interface * Changes multi-stream interface; move retry code from AllocatorFacadePrivate to StreamSafeCUDAAllocator
-