• F
    Support multi-stream allocation for CUDA place (#37290) · b9c464c3
    From00 提交于
    * Support multi-stream allocation for CUDA place
    
    * Do not notify the retrying from other streams when free CUDA allocation
    
    * Fix compile error for CPU
    
    * Fix compile error for HIP
    
    * Release memory for StreamSafeCUDAAllocaRetry in malloc_test
    
    * Add FLAGS_use_stream_safe_cuda_allocator
    
    * Fix CI error for 'set_tests_properties'
    
    * Invalidate stream safe CUDA allocator for naive_best_fit and thread_local strategy
    
    * Performance improvement: insert allocation pair to outstanding_events_map when free but not alloc; replace recursive_mutex with SpinLock
    
    * FLAGS priority changes: FLAGS_use_system_allocator > FLAGS_use_stream_safe_cuda_allocator
    
    * Performance improvement: directly delete allocation when the recorded_streams is empty in FreeImpl of StreamSafeCUDAAllocator
    
    * Add UT for alloc interface
    
    * Changes multi-stream interface; move retry code from AllocatorFacadePrivate to StreamSafeCUDAAllocator
    b9c464c3
stream_safe_cuda_alloc_test.cu 6.8 KB