1. 08 2月, 2022 1 次提交
    • F
      Support allocate CUDA managed memory (#39075) · 42910361
      From00 提交于
      * Rough implementation for experiment
      
      * Support allocate cuda managed memory
      
      * Fix CI error
      
      * Modify UT
      
      * Check whether support memory oversubscription
      
      * Fix ROCM Compile error
      
      * Fix ROCM Compile error
      
      * Fix UT cuda_managed_memory_test
      
      * Set UT timeout to 40
      
      * Add UT OOMExceptionTest
      
      * Set UT timeout to 50
      42910361
  2. 25 1月, 2022 1 次提交
  3. 08 12月, 2021 1 次提交
    • F
      Fix CUDAGraphAllocator bug for StreamSafeCUDAAllocator (#37821) · b4a67491
      From00 提交于
      * Fix CUDAGraph bug for StreamSafeCUDAAllocator
      
      * Add CUDAGrapthAllocator check in multi-stream interface
      
      * Set FLAGS_use_stream_safe_cuda_allocator defaulted to false
      
      * Fix environment error for cmake
      
      * Fix cmake error
      
      * Add UT of GetAllocatorInterfaceTest
      
      * Add UT of CUDAGraphExceptionTest
      
      * Enhance CUDAGraphExceptionTest
      b4a67491
  4. 25 11月, 2021 1 次提交
    • F
      Support multi-stream allocation for CUDA place (#37290) · b9c464c3
      From00 提交于
      * Support multi-stream allocation for CUDA place
      
      * Do not notify the retrying from other streams when free CUDA allocation
      
      * Fix compile error for CPU
      
      * Fix compile error for HIP
      
      * Release memory for StreamSafeCUDAAllocaRetry in malloc_test
      
      * Add FLAGS_use_stream_safe_cuda_allocator
      
      * Fix CI error for 'set_tests_properties'
      
      * Invalidate stream safe CUDA allocator for naive_best_fit and thread_local strategy
      
      * Performance improvement: insert allocation pair to outstanding_events_map when free but not alloc; replace recursive_mutex with SpinLock
      
      * FLAGS priority changes: FLAGS_use_system_allocator > FLAGS_use_stream_safe_cuda_allocator
      
      * Performance improvement: directly delete allocation when the recorded_streams is empty in FreeImpl of StreamSafeCUDAAllocator
      
      * Add UT for alloc interface
      
      * Changes multi-stream interface; move retry code from AllocatorFacadePrivate to StreamSafeCUDAAllocator
      b9c464c3
  5. 01 2月, 2021 1 次提交
  6. 14 1月, 2021 1 次提交
  7. 14 1月, 2020 1 次提交
  8. 24 9月, 2019 1 次提交
  9. 11 9月, 2019 1 次提交
    • H
      Replace TemporaryAllocator by CUDADeviceContextAllocator (#18989) · 12542320
      Huihuang Zheng 提交于
      TemporaryAllocator is a singleton used for allocating memory for Cudnn. Since it is a singleton, we can delete it for better performance in memory.
      
      We replace TemporaryAllocator by CUDADeviceContextAllocator and CUDADeviceContextAllocation, which uses stream callback to delete the memory allocated for the stream to avoid singleton.
      
      Also added data_feed_proto to operator to fix CI in CPU compilation
      12542320
  10. 11 3月, 2019 1 次提交
  11. 04 3月, 2019 3 次提交
  12. 01 3月, 2019 1 次提交
  13. 02 2月, 2019 2 次提交
  14. 16 11月, 2018 1 次提交
  15. 10 10月, 2018 1 次提交
  16. 28 9月, 2018 1 次提交
  17. 08 4月, 2018 3 次提交
  18. 03 4月, 2018 2 次提交
  19. 30 3月, 2018 1 次提交
  20. 10 2月, 2018 1 次提交
  21. 06 2月, 2018 1 次提交
  22. 30 1月, 2018 1 次提交
  23. 23 1月, 2018 2 次提交
  24. 16 1月, 2018 1 次提交
  25. 24 11月, 2017 1 次提交
  26. 28 10月, 2017 1 次提交
  27. 15 8月, 2017 1 次提交
  28. 04 8月, 2017 2 次提交
  29. 29 7月, 2017 1 次提交
  30. 27 7月, 2017 1 次提交
  31. 23 7月, 2017 2 次提交