1. 15 2月, 2022 1 次提交
    • R
      [PluggableDevice] Add custom runtime support (#38740) · 3e7825f3
      ronnywang 提交于
      * [CustomRuntime] Add DeviceManager
      
      * [CustomRuntime] Add DeviceInterface
      
      * [CustomRuntime] Add Stream, Event, DeviceGuard, CallbackManager
      
      * [CustomRuntime] Add plug-in device
      
      * [CustomRuntime] Memory module support PluggableDevice
      
      * [CustomRuntime] Add WITH_PLUGGABLE_DEVICE cmake option
      
      * update
      
      * [API] update API doc based on comments, test=develop
      Co-authored-by: Nqili93 <qili93@qq.com>
      3e7825f3
  2. 27 1月, 2022 1 次提交
    • A
      [PluggableDevice] Add custom kernel support based on pten kernel management (#38848) · a8879215
      Aganlengzi 提交于
      * [Demo] custom kernel based on pten kernel
      
      * merge and npu custom work well
      
      * del comments
      
      * delete other code
      
      * fix CUDAContext
      
      * fix not found small_vector.h
      
      * support NPU
      
      * fix NPUContext
      
      * fix DeviceContext support
      
      * add UT
      
      * fix call
      
      * add UT
      
      * fix
      
      * fix for comments and ut
      
      * add MACRO control
      
      * fix multi input output
      
      * support env CUSTOM_DEVICE_ROOT
      
      * deal with special cases
      
      * fix for Windows
      
      * try coverage with test_custom_kernel_dot.py
      
      * fix test_custom_kernel_dot
      
      * fix test_custom_kernel_dot
      
      * fix merge
      
      * fix merge
      
      * fix CI
      
      * update
      
      * merge and fix
      
      * remove WITH_CUSTOM_KERNEL
      
      * fix merge
      
      * merge and fix
      
      * fix ut
      
      * fix ut for mac
      
      * add more UT
      
      * add more UT
      
      * fix
      a8879215
  3. 20 12月, 2021 1 次提交
  4. 07 12月, 2021 1 次提交
  5. 27 11月, 2021 1 次提交
    • A
      [NPU] reorganization for device API abstraction (#37110) · 72241a6a
      Aganlengzi 提交于
      * [NPU] reorganization for device API abstraction
      
      * [NPU] delete old files
      
      * [NPU] fix npu_collective_helper
      
      * [NPU] fix collective_helper
      
      * [NPU] fix ut
      
      * [NPU] mod memory allocation and hccl_helper
      
      * [NPU] fix place_type
      
      * [NPU] split enfoce.h
      
      * move acl* call into npu_info
      
      * merge conflict
      
      * fix merge
      
      * merge conflict
      
      * merge conflict
      72241a6a
  6. 23 11月, 2021 1 次提交
  7. 09 11月, 2021 1 次提交
  8. 17 9月, 2021 1 次提交
    • Z
      Make flag adding easier (#35823) · 2c781455
      Zeng Jinle 提交于
      * make flag setter easier
      
      * update
      
      * rename macro name
      
      * fix bug of public/writable
      
      * update to pass CI
      
      * polish
      
      * fix CPU link error
      2c781455
  9. 18 8月, 2021 1 次提交
    • Z
      Add function to disable paddle signal handler (#34577) · dd533dd3
      Zhanlue Yang 提交于
      * Add function to disable paddle signal handler
      
      Paddle used google::InstallFaultSignalHandler to handle selected system signals,
      mainly for debugging and bug report purposes.
      
      However, this can be conflicted with other python packages whoever captures similar signals.
      Such python package involves tvm and more
      
      To resolve this issue, we support a function to disable signal handler
      
      * Remove signal test from WIN32 platform
      
      * Remove redundant return from disable_signal_handler() function
      
      * Add detailed messages to en_doc
      dd533dd3
  10. 12 8月, 2021 1 次提交
  11. 03 8月, 2021 1 次提交
  12. 09 4月, 2021 1 次提交
    • L
      [NPU] cherry-pick basic NPU components/allocator/operator/executor supports from ascendrc (#32144) · ccf5709d
      Leo Chen 提交于
      * [feature] support npu allocator (#30840)
      
      [feature] support npu allocator
      
      * [feature] support npu operator (#30951)
      
      [feature] support npu operator
      
      * [feature] support npu allocator, part 2 (#30972)
      
      * support npu allocator
      
      * add npu device context
      
      * fix some compile problem
      
      * fix some compile problem
      
      * add npu info
      
      * compile ok
      
      * fix include dir
      
      * support naive_best_fit_allocator
      
      * run ut ok, bug failed to exit
      
      * call aclrtResetDevice before exit
      
      * fix aclFinilize
      
      * add system allocatot test
      
      * add selected_gpus in gtest
      
      * add tensor_test for npu
      
      * support npu op, initial commit
      
      * add npu stream
      
      * add elementwise_add_op
      
      * compile ok
      
      * fix typo
      
      * fix elementwise_add_op_npu_test
      
      * support op run
      
      * test can run but failed
      
      * change aclopExecuteV2 to aclopCompileAndExecute
      
      * support parsing ascend rank table file (#31000)
      
      support parsing ascend rank table file
      
      * Fix reshape on GE graph. (#31084)
      
      Fix reshape on GE graph
      
      * add npu kernel for elementwise_sub and elementwise_sub_grad (#30973)
      
      * add npu sub op
      
      * fix typo
      
      * rename test
      
      * fix bug
      
      * fix bug
      
      * add fp16 kernel
      
      * fix typo
      
      * support sub grad op
      
      * support elementwise_sub_grad op
      Co-authored-by: Nfrankwhzhang <frankwhzhang@126.com>
      
      * Fix compilation problem (#31100)
      
      Fix compilation problem (#31100)
      
      * fix compile
      
      * fix code stype
      
      * remove const_cast
      
      * support adding correct npu op in pybind.h (#31143)
      
      * support adding correct npu op in pybind.h
      
      * refine code
      
      * [NPU] Support executor with NPU (#31057)
      
      * [NPU] Support executor with NPU
      
      * Fix code according to reviews
      
      * Fix code
      
      * Add unittest for sub op npu
      
      * refactor npu device manager (#31154)
      
      refactor npu device manager (#31154)
      
      * fix selected npus
      
      * fix compile
      
      * fix reading flags from env
      
      * format
      Co-authored-by: Nxiayanming <41795079@qq.com>
      Co-authored-by: Ngongweibao <weibao.gong@gmail.com>
      Co-authored-by: Nfrankwhzhang <frankwhzhang@126.com>
      Co-authored-by: Nliym27 <33742067+liym27@users.noreply.github.com>
      ccf5709d
  13. 22 2月, 2021 1 次提交
  14. 04 2月, 2021 1 次提交
  15. 15 1月, 2021 1 次提交
  16. 17 12月, 2020 1 次提交
  17. 20 11月, 2020 1 次提交
  18. 04 11月, 2020 1 次提交
  19. 30 10月, 2020 1 次提交
  20. 21 8月, 2020 1 次提交
    • Q
      support Baidu Kunlun AI Accelerator (#25959) · 138ecf24
      QingshuChen 提交于
      * support Baidu AI Accelerator
        * test=kunlun
      
      * minor
       * test=kunlun
      
      * support xpu op in separate file
       * test=kunlun
      
      * update XPU error message and remove duplicated code
      
       * test=kunlun
      
      * minor
       * test=kunlun
      
      * minor
       * test=kunlun
      138ecf24
  21. 04 8月, 2020 1 次提交
  22. 29 7月, 2020 2 次提交
  23. 15 7月, 2020 1 次提交
  24. 07 7月, 2020 1 次提交
  25. 03 6月, 2020 1 次提交
  26. 01 6月, 2020 1 次提交
  27. 19 5月, 2020 1 次提交
  28. 29 4月, 2020 1 次提交
  29. 04 4月, 2020 1 次提交
    • L
      Dev/fix init flags (#23465) · f297a332
      Leo Chen 提交于
      * fix init_gflags with 'python -c', test=develop
      
      * add test, test=develop
      
      * use sys.executable instead of python, test=develop
      
      * keep dummy, test=develop
      f297a332
  30. 05 12月, 2019 1 次提交
  31. 04 12月, 2019 1 次提交
  32. 03 12月, 2019 1 次提交
  33. 18 10月, 2019 1 次提交
  34. 11 9月, 2019 1 次提交
    • H
      Replace TemporaryAllocator by CUDADeviceContextAllocator (#18989) · 12542320
      Huihuang Zheng 提交于
      TemporaryAllocator is a singleton used for allocating memory for Cudnn. Since it is a singleton, we can delete it for better performance in memory.
      
      We replace TemporaryAllocator by CUDADeviceContextAllocator and CUDADeviceContextAllocation, which uses stream callback to delete the memory allocated for the stream to avoid singleton.
      
      Also added data_feed_proto to operator to fix CI in CPU compilation
      12542320
  35. 30 8月, 2019 2 次提交
  36. 28 8月, 2019 1 次提交
  37. 16 8月, 2019 1 次提交
  38. 04 7月, 2019 1 次提交