1. 13 1月, 2022 1 次提交
  2. 28 12月, 2021 1 次提交
    • F
      Utilize StreamSafeCUDAAllocator to support fast GC in new executor (#37642) · 0c7153a4
      From00 提交于
      * fix reshape move storage error
      
      * remove needless set type
      
      * alloc tensor by shared storage
      
      * Utilize StreamSafeCUDAAllocator to support fast GC in new executor
      
      * Fix compile error for Windows and ROCm
      
      * Fix compile error for Windows
      
      * Modify UT stream_safe_cuda_alloc_test
      
      * Modify UT stream_safe_cuda_alloc_test
      
      * Rewrite fast GC
      
      * Rewrite fast GC
      
      * Fix compile error for BOOST_GET_CONST
      
      * Fix compile error for BOOST_GET_CONST
      
      * Changes default stream for StreamSafeCUDAAllocator
      
      * Fix a small CI error
      
      * Remove some redundant code
      
      * Fix conflict
      
      * Fix compile error for ROCm
      
      * Fix Windoes CI error
      
      * Fix CI error
      
      * Remove some unnecessary code
      
      * Fix CI error
      
      * Add UT for fast GC
      
      * Fix CI error
      
      * add device-agnostic stream class
      
      * add stream.h
      
      * fix ut
      
      * fix cpu compile
      
      * Use RWLock in GetAllocator
      
      * Fix CI error
      Co-authored-by: NChen Weihang <chenweihang@baidu.com>
      Co-authored-by: Nzhiqiu <chenqiuliang@baidu.com>
      0c7153a4
  3. 27 12月, 2021 1 次提交
  4. 17 12月, 2021 1 次提交
  5. 25 11月, 2021 1 次提交
    • F
      Support multi-stream allocation for CUDA place (#37290) · b9c464c3
      From00 提交于
      * Support multi-stream allocation for CUDA place
      
      * Do not notify the retrying from other streams when free CUDA allocation
      
      * Fix compile error for CPU
      
      * Fix compile error for HIP
      
      * Release memory for StreamSafeCUDAAllocaRetry in malloc_test
      
      * Add FLAGS_use_stream_safe_cuda_allocator
      
      * Fix CI error for 'set_tests_properties'
      
      * Invalidate stream safe CUDA allocator for naive_best_fit and thread_local strategy
      
      * Performance improvement: insert allocation pair to outstanding_events_map when free but not alloc; replace recursive_mutex with SpinLock
      
      * FLAGS priority changes: FLAGS_use_system_allocator > FLAGS_use_stream_safe_cuda_allocator
      
      * Performance improvement: directly delete allocation when the recorded_streams is empty in FreeImpl of StreamSafeCUDAAllocator
      
      * Add UT for alloc interface
      
      * Changes multi-stream interface; move retry code from AllocatorFacadePrivate to StreamSafeCUDAAllocator
      b9c464c3
  6. 06 11月, 2020 1 次提交
  7. 04 11月, 2020 1 次提交
  8. 24 9月, 2020 1 次提交
    • W
      use iwyu clean include (#27267) · df43905f
      wanghuancoder 提交于
      * use iwyu clean include, test=develop, test=win
      
      * compilation error, test=develop
      
      * fix compilation error2, test=develop
      
      * fix compilation error3, test=develop
      
      * fix compilation error4, test=develop
      
      * fix compilation error5, test=develop
      
      * fix compilation error6, test=develop
      
      * fix compilation error7, test=develop
      
      * fix compilation error8, test=develop
      
      * fix compilation error8, test=develop
      
      * fix compilation error10, test=develop
      
      * fix compilation error11, test=develop
      df43905f
  9. 11 9月, 2019 1 次提交
    • H
      Replace TemporaryAllocator by CUDADeviceContextAllocator (#18989) · 12542320
      Huihuang Zheng 提交于
      TemporaryAllocator is a singleton used for allocating memory for Cudnn. Since it is a singleton, we can delete it for better performance in memory.
      
      We replace TemporaryAllocator by CUDADeviceContextAllocator and CUDADeviceContextAllocation, which uses stream callback to delete the memory allocated for the stream to avoid singleton.
      
      Also added data_feed_proto to operator to fix CI in CPU compilation
      12542320
  10. 10 6月, 2019 1 次提交
  11. 16 11月, 2018 1 次提交
  12. 14 11月, 2018 1 次提交
  13. 10 10月, 2018 1 次提交
  14. 28 9月, 2018 1 次提交
  15. 08 4月, 2018 1 次提交
  16. 26 3月, 2018 3 次提交
  17. 20 3月, 2018 1 次提交
  18. 12 2月, 2018 1 次提交
  19. 10 2月, 2018 2 次提交
  20. 05 2月, 2018 1 次提交
  21. 09 1月, 2018 1 次提交
  22. 18 8月, 2017 1 次提交
  23. 04 8月, 2017 1 次提交
  24. 28 7月, 2017 1 次提交
  25. 25 7月, 2017 1 次提交
  26. 22 7月, 2017 1 次提交
  27. 21 7月, 2017 2 次提交
  28. 19 7月, 2017 3 次提交
    • L
      Add memcpy · e53a48b4
      liaogang 提交于
      e53a48b4
    • F
      Simplify Tensor implimentation · 55d30172
      fengjiayi 提交于
      ATTENTION: some interfaces changed:
      1. void Tensor::set_dims(const DDim& dims) ==> void Tensor::Resize(const DDim& dims).
      2. void Tensor::ShareDataFrom(const Tensor& src)  ==> void Tensor::ShareDataWith(const Tensor& src)
      3. DDim Tensor::dims() const ==> const DDim& Tensor::dims() const
      55d30172
    • L
      Add memcpy · 028f3dc4
      liaogang 提交于
      028f3dc4
  29. 06 7月, 2017 1 次提交
  30. 28 6月, 2017 1 次提交
  31. 27 6月, 2017 1 次提交
  32. 26 6月, 2017 2 次提交
  33. 25 5月, 2017 1 次提交