1. 10 12月, 2021 1 次提交
  2. 08 12月, 2021 1 次提交
    • F
      Fix CUDAGraphAllocator bug for StreamSafeCUDAAllocator (#37821) · b4a67491
      From00 提交于
      * Fix CUDAGraph bug for StreamSafeCUDAAllocator
      
      * Add CUDAGrapthAllocator check in multi-stream interface
      
      * Set FLAGS_use_stream_safe_cuda_allocator defaulted to false
      
      * Fix environment error for cmake
      
      * Fix cmake error
      
      * Add UT of GetAllocatorInterfaceTest
      
      * Add UT of CUDAGraphExceptionTest
      
      * Enhance CUDAGraphExceptionTest
      b4a67491
  3. 07 12月, 2021 1 次提交
  4. 03 12月, 2021 1 次提交
  5. 27 11月, 2021 1 次提交
    • A
      [NPU] reorganization for device API abstraction (#37110) · 72241a6a
      Aganlengzi 提交于
      * [NPU] reorganization for device API abstraction
      
      * [NPU] delete old files
      
      * [NPU] fix npu_collective_helper
      
      * [NPU] fix collective_helper
      
      * [NPU] fix ut
      
      * [NPU] mod memory allocation and hccl_helper
      
      * [NPU] fix place_type
      
      * [NPU] split enfoce.h
      
      * move acl* call into npu_info
      
      * merge conflict
      
      * fix merge
      
      * merge conflict
      
      * merge conflict
      72241a6a
  6. 25 11月, 2021 1 次提交
    • F
      Support multi-stream allocation for CUDA place (#37290) · b9c464c3
      From00 提交于
      * Support multi-stream allocation for CUDA place
      
      * Do not notify the retrying from other streams when free CUDA allocation
      
      * Fix compile error for CPU
      
      * Fix compile error for HIP
      
      * Release memory for StreamSafeCUDAAllocaRetry in malloc_test
      
      * Add FLAGS_use_stream_safe_cuda_allocator
      
      * Fix CI error for 'set_tests_properties'
      
      * Invalidate stream safe CUDA allocator for naive_best_fit and thread_local strategy
      
      * Performance improvement: insert allocation pair to outstanding_events_map when free but not alloc; replace recursive_mutex with SpinLock
      
      * FLAGS priority changes: FLAGS_use_system_allocator > FLAGS_use_stream_safe_cuda_allocator
      
      * Performance improvement: directly delete allocation when the recorded_streams is empty in FreeImpl of StreamSafeCUDAAllocator
      
      * Add UT for alloc interface
      
      * Changes multi-stream interface; move retry code from AllocatorFacadePrivate to StreamSafeCUDAAllocator
      b9c464c3
  7. 23 11月, 2021 1 次提交
  8. 08 11月, 2021 1 次提交
    • W
      Use cuda virtual memory management and merge blocks (#36189) · a1ec1d5a
      wanghuancoder 提交于
      * Use cuda virtual memory management and merge blocks, test=develop
      
      * refine, test=develop
      
      * refine, test=develop
      
      * refine, test=develop
      
      * refine, test=develop
      
      * refine, test=develop
      
      * refine, test=develop
      
      * refine, test=develop
      
      * window dll, test=develop
      
      * fix cuda error of CUDA_ERROR_NOT_INITIALIZED, test=develop
      
      * use autogrowthv2 for system allocator, test=develop
      
      * remove ~CUDAVirtualMemAllocator(), test=develop
      
      * refine, test=develop
      
      * fix cuda error of CUDA_ERROR_NOT_INITIALIZED, test=develop
      
      * fix cuda error of CUDA_ERROR_NOT_INITIALIZED, test=develop
      
      * fix bug, test=develop
      
      * revert system allocator, test =develop
      
      * revert multiprocessing, test=develop
      
      * fix AutoGrowthBestFitAllocatorV2 mutxt, test=develop
      
      * catch cudaErrorInitializationError when create allocator, test=develop
      
      * fix cuMemSetAccess use, test=develop
      
      * refine cuda api use, test=develop
      
      * refine, test=develop
      
      * for test, test=develop
      
      * for test, test=develop
      
      * switch to v2, test=develop
      
      * refine virtual allocator, test=develop
      
      * Record cuMemCreate and cuMemRelease, test=develop
      
      * refine, test=develop
      
      * avoid out of bounds, test=develop
      
      * rename allocator, test=develop
      
      * refine, test=develop
      
      * use PADDLE_ENFORCE_CUDA_SUCCESS, test=develop
      
      * for test,test=develop
      
      * refine, test=develop
      
      * refine, test=develop
      
      * refine, test=develop
      
      * refine, test=develop
      
      * refine, test=develop
      
      * refine, test=develop
      a1ec1d5a
  9. 01 11月, 2021 1 次提交
  10. 11 10月, 2021 1 次提交
  11. 29 9月, 2021 1 次提交
    • Z
      Add basic support for CUDA Graph (#36190) · 21b93c3d
      Zeng Jinle 提交于
      * add basic support for CUDA Graph
      
      * fix ci compile error
      
      * fix LOG print, fix windows CI
      
      * follow comments and update
      
      * small fix for default ctor
      
      * fix rocm compile error
      
      * fix CPU compile error
      21b93c3d
  12. 17 9月, 2021 1 次提交
    • Z
      Make flag adding easier (#35823) · 2c781455
      Zeng Jinle 提交于
      * make flag setter easier
      
      * update
      
      * rename macro name
      
      * fix bug of public/writable
      
      * update to pass CI
      
      * polish
      
      * fix CPU link error
      2c781455
  13. 03 8月, 2021 1 次提交
  14. 19 7月, 2021 1 次提交
  15. 12 5月, 2021 1 次提交
  16. 09 4月, 2021 1 次提交
    • L
      [NPU] cherry-pick basic NPU components/allocator/operator/executor supports from ascendrc (#32144) · ccf5709d
      Leo Chen 提交于
      * [feature] support npu allocator (#30840)
      
      [feature] support npu allocator
      
      * [feature] support npu operator (#30951)
      
      [feature] support npu operator
      
      * [feature] support npu allocator, part 2 (#30972)
      
      * support npu allocator
      
      * add npu device context
      
      * fix some compile problem
      
      * fix some compile problem
      
      * add npu info
      
      * compile ok
      
      * fix include dir
      
      * support naive_best_fit_allocator
      
      * run ut ok, bug failed to exit
      
      * call aclrtResetDevice before exit
      
      * fix aclFinilize
      
      * add system allocatot test
      
      * add selected_gpus in gtest
      
      * add tensor_test for npu
      
      * support npu op, initial commit
      
      * add npu stream
      
      * add elementwise_add_op
      
      * compile ok
      
      * fix typo
      
      * fix elementwise_add_op_npu_test
      
      * support op run
      
      * test can run but failed
      
      * change aclopExecuteV2 to aclopCompileAndExecute
      
      * support parsing ascend rank table file (#31000)
      
      support parsing ascend rank table file
      
      * Fix reshape on GE graph. (#31084)
      
      Fix reshape on GE graph
      
      * add npu kernel for elementwise_sub and elementwise_sub_grad (#30973)
      
      * add npu sub op
      
      * fix typo
      
      * rename test
      
      * fix bug
      
      * fix bug
      
      * add fp16 kernel
      
      * fix typo
      
      * support sub grad op
      
      * support elementwise_sub_grad op
      Co-authored-by: Nfrankwhzhang <frankwhzhang@126.com>
      
      * Fix compilation problem (#31100)
      
      Fix compilation problem (#31100)
      
      * fix compile
      
      * fix code stype
      
      * remove const_cast
      
      * support adding correct npu op in pybind.h (#31143)
      
      * support adding correct npu op in pybind.h
      
      * refine code
      
      * [NPU] Support executor with NPU (#31057)
      
      * [NPU] Support executor with NPU
      
      * Fix code according to reviews
      
      * Fix code
      
      * Add unittest for sub op npu
      
      * refactor npu device manager (#31154)
      
      refactor npu device manager (#31154)
      
      * fix selected npus
      
      * fix compile
      
      * fix reading flags from env
      
      * format
      Co-authored-by: Nxiayanming <41795079@qq.com>
      Co-authored-by: Ngongweibao <weibao.gong@gmail.com>
      Co-authored-by: Nfrankwhzhang <frankwhzhang@126.com>
      Co-authored-by: Nliym27 <33742067+liym27@users.noreply.github.com>
      ccf5709d
  17. 04 2月, 2021 1 次提交
  18. 01 2月, 2021 1 次提交
  19. 12 1月, 2021 1 次提交
  20. 20 11月, 2020 1 次提交
  21. 06 11月, 2020 1 次提交
  22. 04 11月, 2020 1 次提交
  23. 21 8月, 2020 1 次提交
    • Q
      support Baidu Kunlun AI Accelerator (#25959) · 138ecf24
      QingshuChen 提交于
      * support Baidu AI Accelerator
        * test=kunlun
      
      * minor
       * test=kunlun
      
      * support xpu op in separate file
       * test=kunlun
      
      * update XPU error message and remove duplicated code
      
       * test=kunlun
      
      * minor
       * test=kunlun
      
      * minor
       * test=kunlun
      138ecf24
  24. 24 6月, 2020 1 次提交
    • L
      Refine error message in memory folder (#25095) · ff5be2fb
      Leo Chen 提交于
      * refine PADDLE_THROW, test=develop
      
      * refine error msg, test=develop
      
      * refine cuda error, test=develop
      
      * follow comments, test=develop
      
      * fix compile problem, test=develop
      
      * fix bug, test=develop
      ff5be2fb
  25. 21 4月, 2020 1 次提交
  26. 28 11月, 2019 1 次提交
  27. 09 9月, 2019 1 次提交
  28. 01 9月, 2019 1 次提交
    • Z
      Add retry_allocator for gpu (#19409) · 0a73f720
      Zeng Jinle 提交于
      * add retry_allocator for gpu, test=develop
      
      * follow chengduoZH's comments, test=develop
      
      * follow huihuang's comments,test=develop
      
      * change f,l in enforce.h to be file,line, test=develop
      
      * increase code coverage by adding unittests, test=develop
      
      * fix CMakeLists.txt, test=develop
      0a73f720
  29. 18 7月, 2019 1 次提交
    • Z
      Feature/auto_growth_allocator (#18561) · ae58afc5
      Zeng Jinle 提交于
      * feature/auto_growth_allocator, test=develop
      
      * add unittest of AlignedAllocator, test=develop
      
      * try to turn on auto_growth to test on CI, test=develop
      
      * fix segmentation fault in mixed_vector.h, test=develop
      
      * add unittests, test=develop
      ae58afc5
  30. 12 7月, 2019 1 次提交
  31. 10 6月, 2019 1 次提交
  32. 27 5月, 2019 1 次提交
    • Z
      Code clean of Allocator (#17602) · 4aa931dd
      Zeng Jinle 提交于
      * Revert "Revert "Fix allocator bug""
      
      This reverts commit 174d0d0b.
      
      * Revert "fix travis ci"
      
      This reverts commit 5656fa9f.
      
      test=develop
      
      * add inlined_vector.h, test=develop
      
      * add inlined_vector_test,test=develop
      
      * clean code of allocator,test=develop
      
      * delete zero_size_allocator.h,test=develop
      
      * fix failed unittest,test=develop
      4aa931dd
  33. 23 5月, 2019 1 次提交
    • Z
      Fix allocator bug (#16712) · c6189637
      Zeng Jinle 提交于
      * Revert "Revert "Fix allocator bug""
      
      This reverts commit 174d0d0b.
      
      * Revert "fix travis ci"
      
      This reverts commit 5656fa9f.
      
      test=develop
      
      * add inlined_vector.h, test=develop
      
      * add inlined_vector_test,test=develop
      c6189637
  34. 28 3月, 2019 1 次提交
  35. 25 3月, 2019 1 次提交
    • S
      split PR · c20db635
      sneaxiy 提交于
      test=develop
      c20db635
  36. 21 3月, 2019 1 次提交
    • S
      add more unittest · 953214ad
      sneaxiy 提交于
      modify allocator strategy
      remove changes of legacy buddy_allocator
      test=develop
      953214ad
  37. 18 3月, 2019 1 次提交
  38. 06 3月, 2019 1 次提交
  39. 13 2月, 2019 1 次提交
    • G
      Clang build fixes (#15628) · da9c94da
      Gabor Buella 提交于
      * Remove some superfluous std::move calls
      
      The std:move triggered a build error (with -Werror):
      ```
      [  9%] Building CXX object paddle/fluid/memory/allocation/CMakeFiles/allocator_facade.dir/allocator_facade.cc.o
      /home/tej/code/gbuella_paddle/paddle/fluid/memory/allocation/allocator_facade.cc:86:29: error: moving a temporary object prevents copy elision [-Werror,-Wpessimizing-move]
                  [this] { return std::move(CreateAllocatorWithChunk()); }, capacity);
                                  ^
      /home/tej/code/gbuella_paddle/paddle/fluid/memory/allocation/allocator_facade.cc:86:29: note: remove std::move call here
                  [this] { return std::move(CreateAllocatorWithChunk()); }, capacity);
                                  ^~~~~~~~~~                          ~
      1 error generated.
      ```
      
      See: https://reviews.llvm.org/D7633
      
      * Remove a superfluous lambda capture from framework/operator.h
      
      ```
      [ 10%] Building CXX object paddle/fluid/platform/CMakeFiles/device_context.dir/init.cc.o
      In file included from /home/tej/code/gbuella_paddle/paddle/fluid/platform/init.cc:19:
      /home/tej/code/gbuella_paddle/paddle/fluid/framework/operator.h:229:21: error: lambda capture 'this' is not used [-Werror,-Wunused-lambda-capture]
                         [this](Variable* var) { return var; });
                          ^~~~
      1 error generated.
      ```
      
      Changing it to `return it->second;`, as is in the function below.
      
      * Rethrow an exception (instead of copying it)
      
      ```
      [ 11%] Building CXX object paddle/fluid/framework/CMakeFiles/operator.dir/operator.cc.o
      /home/tej/code/gbuella_paddle/paddle/fluid/framework/operator.cc:191:13: error: local variable 'exception' will be copied despite being thrown by name [-Werror,-Wreturn-std-move]
            throw exception;
                  ^~~~~~~~~
      /home/tej/code/gbuella_paddle/paddle/fluid/framework/operator.cc:191:13: note: call 'std::move' explicitly to avoid copying
            throw exception;
                  ^~~~~~~~~
                  std::move(exception)
      
      ```
      
      See https://reviews.llvm.org/D43322 for an explanation of this diagnostic message.
      
      * Remove an unused variable
      
      ```
      /home/tej/code/gbuella_paddle/paddle/fluid/framework/operator.cc:884:16: error: private field 'scope_' is not used [-Werror,-Wunused-private-field]
        const Scope& scope_;
                     ^
      ```
      
      * struct ComputationOpHandle -> class ComputationOpHandle
      
      ```
      [ 13%] Building CXX object paddle/fluid/framework/details/CMakeFiles/memory_early_delete_pass.dir/memory_early_delete_pass.cc.o
      In file included from /home/tej/code/gbuella_paddle/paddle/fluid/framework/details/memory_early_delete_pass.cc:21:
      /home/tej/code/gbuella_paddle/paddle/fluid/framework/details/reference_count_pass_helper.h:30:1: error: class 'ComputationOpHandle' was previously declared as a struct; this is valid, but may result in linker errors under the Microsoft C++ ABI [-Werror,-Wmismatched-tags]
      class ComputationOpHandle;
      ^
      /home/tej/code/gbuella_paddle/paddle/fluid/framework/details/computation_op_handle.h:29:8: note: previous use is here
      struct ComputationOpHandle : public OpHandleBase {
             ^
      /home/tej/code/gbuella_paddle/paddle/fluid/framework/details/reference_count_pass_helper.h:30:1: note: did you mean struct here?
      class ComputationOpHandle;
      ^~~~~
      struct
      1 error generated.
      ```
      
      * Fix name() methods under fluid/operators
      
      ```
      In file included from /home/tej/code/gbuella_paddle/paddle/fluid/operators/jit/gen/act.cc:15:
      In file included from /home/tej/code/gbuella_paddle/paddle/fluid/operators/jit/gen/act.h:19:
      /home/tej/code/gbuella_paddle/paddle/fluid/operators/jit/gen/jitcode.h:71:23: error: 'name' overrides a member function but is not marked 'override' [-Werror,-Winconsistent-missing-override]
        virtual const char* name() const = 0;
                            ^
      /home/tej/code/gbuella_paddle/paddle/fluid/operators/jit/gen_base.h:31:23: note: overridden virtual function is here
        virtual const char* name() const = 0;
                            ^
      ```
      
      test=develop
      da9c94da
  40. 26 11月, 2018 1 次提交