1. 26 5月, 2023 1 次提交
    • Y
      [PHI Decoupling]Create PHI shared lib (#53735) · da50a009
      YuanRisheng 提交于
      * create phi so
      
      * fix ci bugs
      
      * fix py3 bugs
      
      * add file
      
      * fix py3 bugs
      
      * fix windows bugs
      
      * perfect so
      
      * fix py3 bugs
      
      * delete all static target in phi
      
      * fix windows bugs
      
      * fix py3 bugs
      
      * fix ci bugs
      
      * fix windows bugs
      
      * fix bugs: gflags can't be linked by dynamic and static lib
      
      * fix bugs that can not load 3rd party
      
      * fix ci bugs
      
      * fix compile bugs
      
      * fix py3 bugs
      
      * fix conflict
      
      * fix xpu bugs
      
      * fix mac compile bugs
      
      * fix psgpu bugs
      
      * fix inference failed
      
      * deal with conflict
      
      * fix LIBRARY_PATH bug
      
      * fix windows bugs
      
      * fix onednn error
      
      * fix windows compile bugs
      
      * fix windows compile bugs
      
      * fix test_cuda_graph_static_mode_error aborted
      
      * fix windows bugs
      
      * fix mac-python3 error
      
      * fix hip compile bugs
      
      * change mode to static
      
      * change to static mode
      
      * fix ci bugs
      
      * fix py3 bugs
      
      * fix windows bugs
      
      * fix bugs
      
      * add static flag
      
      * add PADDLE_API
      
      * change position of PADDLE_API
      
      * fix windows bugs
      
      * change mode to dynamic lib
      
      * fix windows static bugs
      
      * deal with conflict
      
      * fix windows unit bug
      
      * fix coverage
      
      * deal with conflict
      
      * fix windows-inference
      
      * fix py3 bugs
      
      * fix bugs when compile type_info
      
      * fix compile bugs
      
      * fix py3 bugs
      
      * fix windows bugs
      
      * fix windows openblas
      
      * fix xpu bugs
      
      * fix enforce_test in windows
      
      * update code according comment
      
      * fix windows cmake bug
      
      * fix windows bugs
      
      * fix windows bugs
      
      * delete cinn unittest
      
      * fix cinn bugs
      
      ---------
      Co-authored-by: HappyHeavyRain's avatarlzydev <1528794076@qq.com>
      da50a009
  2. 03 4月, 2023 1 次提交
  3. 08 12月, 2022 1 次提交
    • H
      [PHI decoupling] move cuda_graph from fluid to phi (#48686) · a4d9851b
      huangjiyi 提交于
      * move cuda_graph from fluid to phi
      
      * move device_memory_aligment from fluid to phi
      
      * Revert "move device_memory_aligment from fluid to phi"
      
      This reverts commit b92fcd39a0a50fdac13278f49be0237a85f3a13f.
      
      * update xpu cmake
      a4d9851b
  4. 09 11月, 2022 1 次提交
  5. 08 11月, 2022 1 次提交
  6. 27 10月, 2022 1 次提交
    • L
      make all cpp tests dynamic linked to libpaddle.so [except windows] (#47088) · 2096448b
      Leo Chen 提交于
      * make all cpp tests dynamic linked to libpaddle.so
      
      * add comments
      
      * keep old cc_test for some tests
      
      * fix some ut
      
      * make some ut use cc_test_old
      
      * fix typos and fit for win32
      
      * fix lib path
      
      * fix some tests
      
      * skip lite test
      
      * fit for rocm
      
      * fit for cinn
      
      * fit for mac
      
      * fit for win32
      
      * skip inference ut
      
      * skip  windows
      
      * fix coverage
      2096448b
  7. 01 9月, 2022 1 次提交
  8. 01 8月, 2022 1 次提交
  9. 19 7月, 2022 1 次提交
  10. 14 7月, 2022 1 次提交
    • L
      refine allocation cmake (#44241) · dc5a0420
      Leo Chen 提交于
      * build into one static library
      
      * move memory/detail to memory/allocation
      
      * fix bug
      
      * fix profiler
      
      * fix framework_proto
      
      * fix deps
      
      * fix inference compilation
      
      * fix rocm compile
      
      * follow comments
      
      * fix buddy_allocator_test
      dc5a0420
  11. 24 6月, 2022 1 次提交
    • C
      record memory and op supplement info (#43550) · 8dd0a3b9
      chenjian 提交于
      * record memory and op supplement info
      
      * update
      
      * update
      
      * fix a bug
      
      * fix memory recording
      
      * fix a bug
      
      * update
      
      * update
      
      * fix a bug
      
      * update
      
      * fix a bug
      
      * fix a bug
      
      * fix a bug
      
      * Revert "fix a bug"
      
      This reverts commit c1d4df52762ba9ae7c7e27cd2ba4fc3a7ed9c7a5.
      
      * fix a bug
      
      * fix format
      
      * fix
      8dd0a3b9
  12. 14 6月, 2022 1 次提交
  13. 04 6月, 2022 1 次提交
  14. 30 3月, 2022 1 次提交
    • F
      Add new APIs for GPU memory monitoring (max_memory_allocated,... · afe02e9d
      From00 提交于
      Add new APIs for GPU memory monitoring (max_memory_allocated, max_memory_reserved, memory_allocated, memory_reserved) (#38657)
      
      * Add new API memory_reserved
      
      * Add memory_allocated, max_memory_reserved and max_memory_allocater
      
      * Fix CI error
      
      * Fix CI error
      
      * Enhance UT
      
      * Add FLAGS_memory_stats_opt
      
      * Add STATS macro functions
      
      * Add StatAllocator
      
      * Fix CI errors
      
      * Add UT
      
      * Fix CI errors
      afe02e9d
  15. 14 3月, 2022 1 次提交
  16. 03 3月, 2022 1 次提交
  17. 15 2月, 2022 1 次提交
    • R
      [PluggableDevice] Add custom runtime support (#38740) · 3e7825f3
      ronnywang 提交于
      * [CustomRuntime] Add DeviceManager
      
      * [CustomRuntime] Add DeviceInterface
      
      * [CustomRuntime] Add Stream, Event, DeviceGuard, CallbackManager
      
      * [CustomRuntime] Add plug-in device
      
      * [CustomRuntime] Memory module support PluggableDevice
      
      * [CustomRuntime] Add WITH_PLUGGABLE_DEVICE cmake option
      
      * update
      
      * [API] update API doc based on comments, test=develop
      Co-authored-by: Nqili93 <qili93@qq.com>
      3e7825f3
  18. 08 2月, 2022 1 次提交
    • F
      Support allocate CUDA managed memory (#39075) · 42910361
      From00 提交于
      * Rough implementation for experiment
      
      * Support allocate cuda managed memory
      
      * Fix CI error
      
      * Modify UT
      
      * Check whether support memory oversubscription
      
      * Fix ROCM Compile error
      
      * Fix ROCM Compile error
      
      * Fix UT cuda_managed_memory_test
      
      * Set UT timeout to 40
      
      * Add UT OOMExceptionTest
      
      * Set UT timeout to 50
      42910361
  19. 25 1月, 2022 1 次提交
  20. 17 12月, 2021 3 次提交
  21. 07 12月, 2021 1 次提交
  22. 25 11月, 2021 1 次提交
    • F
      Support multi-stream allocation for CUDA place (#37290) · b9c464c3
      From00 提交于
      * Support multi-stream allocation for CUDA place
      
      * Do not notify the retrying from other streams when free CUDA allocation
      
      * Fix compile error for CPU
      
      * Fix compile error for HIP
      
      * Release memory for StreamSafeCUDAAllocaRetry in malloc_test
      
      * Add FLAGS_use_stream_safe_cuda_allocator
      
      * Fix CI error for 'set_tests_properties'
      
      * Invalidate stream safe CUDA allocator for naive_best_fit and thread_local strategy
      
      * Performance improvement: insert allocation pair to outstanding_events_map when free but not alloc; replace recursive_mutex with SpinLock
      
      * FLAGS priority changes: FLAGS_use_system_allocator > FLAGS_use_stream_safe_cuda_allocator
      
      * Performance improvement: directly delete allocation when the recorded_streams is empty in FreeImpl of StreamSafeCUDAAllocator
      
      * Add UT for alloc interface
      
      * Changes multi-stream interface; move retry code from AllocatorFacadePrivate to StreamSafeCUDAAllocator
      b9c464c3
  23. 08 11月, 2021 1 次提交
    • W
      Use cuda virtual memory management and merge blocks (#36189) · a1ec1d5a
      wanghuancoder 提交于
      * Use cuda virtual memory management and merge blocks, test=develop
      
      * refine, test=develop
      
      * refine, test=develop
      
      * refine, test=develop
      
      * refine, test=develop
      
      * refine, test=develop
      
      * refine, test=develop
      
      * refine, test=develop
      
      * window dll, test=develop
      
      * fix cuda error of CUDA_ERROR_NOT_INITIALIZED, test=develop
      
      * use autogrowthv2 for system allocator, test=develop
      
      * remove ~CUDAVirtualMemAllocator(), test=develop
      
      * refine, test=develop
      
      * fix cuda error of CUDA_ERROR_NOT_INITIALIZED, test=develop
      
      * fix cuda error of CUDA_ERROR_NOT_INITIALIZED, test=develop
      
      * fix bug, test=develop
      
      * revert system allocator, test =develop
      
      * revert multiprocessing, test=develop
      
      * fix AutoGrowthBestFitAllocatorV2 mutxt, test=develop
      
      * catch cudaErrorInitializationError when create allocator, test=develop
      
      * fix cuMemSetAccess use, test=develop
      
      * refine cuda api use, test=develop
      
      * refine, test=develop
      
      * for test, test=develop
      
      * for test, test=develop
      
      * switch to v2, test=develop
      
      * refine virtual allocator, test=develop
      
      * Record cuMemCreate and cuMemRelease, test=develop
      
      * refine, test=develop
      
      * avoid out of bounds, test=develop
      
      * rename allocator, test=develop
      
      * refine, test=develop
      
      * use PADDLE_ENFORCE_CUDA_SUCCESS, test=develop
      
      * for test,test=develop
      
      * refine, test=develop
      
      * refine, test=develop
      
      * refine, test=develop
      
      * refine, test=develop
      
      * refine, test=develop
      
      * refine, test=develop
      a1ec1d5a
  24. 29 9月, 2021 1 次提交
    • Z
      Add basic support for CUDA Graph (#36190) · 21b93c3d
      Zeng Jinle 提交于
      * add basic support for CUDA Graph
      
      * fix ci compile error
      
      * fix LOG print, fix windows CI
      
      * follow comments and update
      
      * small fix for default ctor
      
      * fix rocm compile error
      
      * fix CPU compile error
      21b93c3d
  25. 17 9月, 2021 1 次提交
    • Z
      Make flag adding easier (#35823) · 2c781455
      Zeng Jinle 提交于
      * make flag setter easier
      
      * update
      
      * rename macro name
      
      * fix bug of public/writable
      
      * update to pass CI
      
      * polish
      
      * fix CPU link error
      2c781455
  26. 12 5月, 2021 1 次提交
  27. 09 4月, 2021 1 次提交
    • L
      [NPU] cherry-pick basic NPU components/allocator/operator/executor supports from ascendrc (#32144) · ccf5709d
      Leo Chen 提交于
      * [feature] support npu allocator (#30840)
      
      [feature] support npu allocator
      
      * [feature] support npu operator (#30951)
      
      [feature] support npu operator
      
      * [feature] support npu allocator, part 2 (#30972)
      
      * support npu allocator
      
      * add npu device context
      
      * fix some compile problem
      
      * fix some compile problem
      
      * add npu info
      
      * compile ok
      
      * fix include dir
      
      * support naive_best_fit_allocator
      
      * run ut ok, bug failed to exit
      
      * call aclrtResetDevice before exit
      
      * fix aclFinilize
      
      * add system allocatot test
      
      * add selected_gpus in gtest
      
      * add tensor_test for npu
      
      * support npu op, initial commit
      
      * add npu stream
      
      * add elementwise_add_op
      
      * compile ok
      
      * fix typo
      
      * fix elementwise_add_op_npu_test
      
      * support op run
      
      * test can run but failed
      
      * change aclopExecuteV2 to aclopCompileAndExecute
      
      * support parsing ascend rank table file (#31000)
      
      support parsing ascend rank table file
      
      * Fix reshape on GE graph. (#31084)
      
      Fix reshape on GE graph
      
      * add npu kernel for elementwise_sub and elementwise_sub_grad (#30973)
      
      * add npu sub op
      
      * fix typo
      
      * rename test
      
      * fix bug
      
      * fix bug
      
      * add fp16 kernel
      
      * fix typo
      
      * support sub grad op
      
      * support elementwise_sub_grad op
      Co-authored-by: Nfrankwhzhang <frankwhzhang@126.com>
      
      * Fix compilation problem (#31100)
      
      Fix compilation problem (#31100)
      
      * fix compile
      
      * fix code stype
      
      * remove const_cast
      
      * support adding correct npu op in pybind.h (#31143)
      
      * support adding correct npu op in pybind.h
      
      * refine code
      
      * [NPU] Support executor with NPU (#31057)
      
      * [NPU] Support executor with NPU
      
      * Fix code according to reviews
      
      * Fix code
      
      * Add unittest for sub op npu
      
      * refactor npu device manager (#31154)
      
      refactor npu device manager (#31154)
      
      * fix selected npus
      
      * fix compile
      
      * fix reading flags from env
      
      * format
      Co-authored-by: Nxiayanming <41795079@qq.com>
      Co-authored-by: Ngongweibao <weibao.gong@gmail.com>
      Co-authored-by: Nfrankwhzhang <frankwhzhang@126.com>
      Co-authored-by: Nliym27 <33742067+liym27@users.noreply.github.com>
      ccf5709d
  28. 07 4月, 2021 1 次提交
    • Z
      【NPU】Merge ascend GE&distributed code by 0208 from ascendrc (#31957) · 8c7c53b3
      zhang wenhui 提交于
      * Ascend rc (#30483)
      
      * Fix compilcation on CANN20.1 and older (#30494)
      
      Fix compilcation on CANN20.1 and older
      
      * Add distribution supported (#30578)
      
      Add distribution supported
      
      * Build praser for Hcom* operators (#30627)
      
      Build praser for Hcom* operators
      
      * Pass device_ids info from launch to trainer. (#30632)
      
      Pass device_ids info from launch to trainer
      
      * Add Hccl program group (#30642)
      
      Add Hccl program group
      
      * Add startup bash files of test_ascend_group. (#30645)
      
      Add startup bash files of test_ascend_group
      
      * cleanup (#30646)
      
      cleanup test_ascend_group.py
      
      * [Feature] Build parser to support distributed training (#30658)
      
      [Feature] Build parser to support distributed training
      
      * fix compilation on ascend-20.1 (#30722)
      
      fix compilation on ascend-20.1
      
      * Dev/fix ascend string (#30749)
      
      Dev/fix ascend string
      
      * code style (#30781)
      
      code style
      
      * Merge ascend_optimizer and ascend_parser. (#30776)
      
      Merge ascend_optimizer and ascend_parser.
      
      * Ascendrc add converted op : [range/equal/range/uniform_random/expand/squeeze], fix cast op bug  (#30797)
      
      Ascendrc add converted op : [range/equal/range/uniform_random/expand/squeeze], fix cast op bug
      
      * Add paddle ascend distribution training supported (#30796)
      
      Add paddle ascend distribution training supported
      
      * pass cxx_flags to gloo cmake (#30857)
      
      * Destroy session first. (#30954)
      
      Destroy session first.
      
      * merge
      
      * fix, test=develop
      
      * fix, test=develop
      
      * fix style, test=develop
      
      * fix, test=develop
      
      * fix
      
      * fix log fatal, test=develop
      
      * fix enforce style, test=develop
      
      * fix, test=develop
      
      * fix, test=develop
      
      * fix rccl, test=develop
      
      * fix test, test=develop
      
      * fix, test=develop
      
      * fix, test=develop
      
      * fix, test=develop
      
      * fix node_num, test=develop
      
      * fix ids str, test=develop
      
      * fix ids str, test=develop
      
      * fix ids str, test=develop
      
      * fix, test=develop
      
      * fix, test=develop
      
      * fix, test=develop
      
      * fix, test=develop
      
      * fix, test=develop
      
      * fix, test=develop
      
      * fix, test=develop
      
      * fix, test=develop
      
      * fix style code, test=develop
      
      * fix style code, test=develop
      
      * fix style code, test=develop
      
      * fix style code, test=develop
      Co-authored-by: Nhutuxian <hutuxian2011@sina.cn>
      Co-authored-by: Ngongweibao <weibao.gong@gmail.com>
      Co-authored-by: NVoid Main <voidmain1313113@gmail.com>
      Co-authored-by: NLeo Chen <chenqiuliang@baidu.com>
      Co-authored-by: Ndingsiyu <18369187719@163.com>
      Co-authored-by: NOleNet <olenet@126.com>
      8c7c53b3
  29. 22 2月, 2021 1 次提交
  30. 11 12月, 2020 1 次提交
    • L
      Add the strategy of skipping cc/cu test compilation and execution in CI (#29499) · b5d4a1f3
      LoveAn 提交于
      * Add the strategy of skipping cc/cu test compilation and execution in CI, test=develop
      
      * fix if error with CI_SKIP_TEST, test=develop
      
      * fix add properties to test error on Linux/MAC, test=develop
      
      * fix set test properties of test_code_generator error, test=develop
      
      * remove test codes and advance judgment of file modification on Linux, test=develop
      
      * rename CI_SKIP_TEST to CI_SKIP_CPP_TEST, test=document_fix
      
      * Add branch judgement on Linux, test=develop
      b5d4a1f3
  31. 04 11月, 2020 1 次提交
  32. 21 8月, 2020 1 次提交
    • Q
      support Baidu Kunlun AI Accelerator (#25959) · 138ecf24
      QingshuChen 提交于
      * support Baidu AI Accelerator
        * test=kunlun
      
      * minor
       * test=kunlun
      
      * support xpu op in separate file
       * test=kunlun
      
      * update XPU error message and remove duplicated code
      
       * test=kunlun
      
      * minor
       * test=kunlun
      
      * minor
       * test=kunlun
      138ecf24
  33. 22 7月, 2020 1 次提交
  34. 08 6月, 2020 1 次提交
  35. 21 4月, 2020 1 次提交
  36. 02 3月, 2020 1 次提交
    • C
      Speed up dygraph DataLoader based on shared memory and LoDTensor serialization (#22541) · 7d8d5734
      Chen Weihang 提交于
      * add lodtensor share memory & serialization, test=develop
      
      * fix windows compile error, test=develop
      
      * deal vartype pickle & fix unittest matching error message, test=develop
      
      * update timeout variable name, test=develop
      
      * refactor memory map implement, test=develop
      
      * clear mmap file discripter when exit unexpectedly, test=develop
      
      * remove the child process fd in advance, test=develop
      
      * remove mmap fds after Queue.put in child process, test=develop
      
      * add hard unittests for register exit func, test=develop
      
      * fix python2 compatibility problem in unittest, test=develop
      
      * fix exception unittest error, test=develop
      
      * polish code based review comment, test=develop
      7d8d5734
  37. 19 12月, 2019 1 次提交
  38. 24 9月, 2019 1 次提交