1. 25 11月, 2021 1 次提交
    • F
      Support multi-stream allocation for CUDA place (#37290) · b9c464c3
      From00 提交于
      * Support multi-stream allocation for CUDA place
      
      * Do not notify the retrying from other streams when free CUDA allocation
      
      * Fix compile error for CPU
      
      * Fix compile error for HIP
      
      * Release memory for StreamSafeCUDAAllocaRetry in malloc_test
      
      * Add FLAGS_use_stream_safe_cuda_allocator
      
      * Fix CI error for 'set_tests_properties'
      
      * Invalidate stream safe CUDA allocator for naive_best_fit and thread_local strategy
      
      * Performance improvement: insert allocation pair to outstanding_events_map when free but not alloc; replace recursive_mutex with SpinLock
      
      * FLAGS priority changes: FLAGS_use_system_allocator > FLAGS_use_stream_safe_cuda_allocator
      
      * Performance improvement: directly delete allocation when the recorded_streams is empty in FreeImpl of StreamSafeCUDAAllocator
      
      * Add UT for alloc interface
      
      * Changes multi-stream interface; move retry code from AllocatorFacadePrivate to StreamSafeCUDAAllocator
      b9c464c3
  2. 06 11月, 2020 1 次提交
  3. 04 11月, 2020 1 次提交
  4. 24 9月, 2020 1 次提交
    • W
      use iwyu clean include (#27267) · df43905f
      wanghuancoder 提交于
      * use iwyu clean include, test=develop, test=win
      
      * compilation error, test=develop
      
      * fix compilation error2, test=develop
      
      * fix compilation error3, test=develop
      
      * fix compilation error4, test=develop
      
      * fix compilation error5, test=develop
      
      * fix compilation error6, test=develop
      
      * fix compilation error7, test=develop
      
      * fix compilation error8, test=develop
      
      * fix compilation error8, test=develop
      
      * fix compilation error10, test=develop
      
      * fix compilation error11, test=develop
      df43905f
  5. 11 9月, 2019 1 次提交
    • H
      Replace TemporaryAllocator by CUDADeviceContextAllocator (#18989) · 12542320
      Huihuang Zheng 提交于
      TemporaryAllocator is a singleton used for allocating memory for Cudnn. Since it is a singleton, we can delete it for better performance in memory.
      
      We replace TemporaryAllocator by CUDADeviceContextAllocator and CUDADeviceContextAllocation, which uses stream callback to delete the memory allocated for the stream to avoid singleton.
      
      Also added data_feed_proto to operator to fix CI in CPU compilation
      12542320
  6. 10 6月, 2019 1 次提交
  7. 16 11月, 2018 1 次提交
  8. 14 11月, 2018 1 次提交
  9. 10 10月, 2018 1 次提交
  10. 28 9月, 2018 1 次提交
  11. 08 4月, 2018 1 次提交
  12. 26 3月, 2018 3 次提交
  13. 20 3月, 2018 1 次提交
  14. 12 2月, 2018 1 次提交
  15. 10 2月, 2018 2 次提交
  16. 05 2月, 2018 1 次提交
  17. 09 1月, 2018 1 次提交
  18. 18 8月, 2017 1 次提交
  19. 04 8月, 2017 1 次提交
  20. 28 7月, 2017 1 次提交
  21. 25 7月, 2017 1 次提交
  22. 22 7月, 2017 1 次提交
  23. 21 7月, 2017 2 次提交
  24. 19 7月, 2017 3 次提交
    • L
      Add memcpy · e53a48b4
      liaogang 提交于
      e53a48b4
    • F
      Simplify Tensor implimentation · 55d30172
      fengjiayi 提交于
      ATTENTION: some interfaces changed:
      1. void Tensor::set_dims(const DDim& dims) ==> void Tensor::Resize(const DDim& dims).
      2. void Tensor::ShareDataFrom(const Tensor& src)  ==> void Tensor::ShareDataWith(const Tensor& src)
      3. DDim Tensor::dims() const ==> const DDim& Tensor::dims() const
      55d30172
    • L
      Add memcpy · 028f3dc4
      liaogang 提交于
      028f3dc4
  25. 06 7月, 2017 1 次提交
  26. 28 6月, 2017 1 次提交
  27. 27 6月, 2017 1 次提交
  28. 26 6月, 2017 2 次提交
  29. 25 5月, 2017 1 次提交
  30. 09 12月, 2016 1 次提交
  31. 29 8月, 2016 1 次提交