1. 27 11月, 2021 1 次提交
    • A
      [NPU] reorganization for device API abstraction (#37110) · 72241a6a
      Aganlengzi 提交于
      * [NPU] reorganization for device API abstraction
      
      * [NPU] delete old files
      
      * [NPU] fix npu_collective_helper
      
      * [NPU] fix collective_helper
      
      * [NPU] fix ut
      
      * [NPU] mod memory allocation and hccl_helper
      
      * [NPU] fix place_type
      
      * [NPU] split enfoce.h
      
      * move acl* call into npu_info
      
      * merge conflict
      
      * fix merge
      
      * merge conflict
      
      * merge conflict
      72241a6a
  2. 23 11月, 2021 1 次提交
  3. 09 11月, 2021 1 次提交
  4. 17 9月, 2021 1 次提交
    • Z
      Make flag adding easier (#35823) · 2c781455
      Zeng Jinle 提交于
      * make flag setter easier
      
      * update
      
      * rename macro name
      
      * fix bug of public/writable
      
      * update to pass CI
      
      * polish
      
      * fix CPU link error
      2c781455
  5. 18 8月, 2021 1 次提交
    • Z
      Add function to disable paddle signal handler (#34577) · dd533dd3
      Zhanlue Yang 提交于
      * Add function to disable paddle signal handler
      
      Paddle used google::InstallFaultSignalHandler to handle selected system signals,
      mainly for debugging and bug report purposes.
      
      However, this can be conflicted with other python packages whoever captures similar signals.
      Such python package involves tvm and more
      
      To resolve this issue, we support a function to disable signal handler
      
      * Remove signal test from WIN32 platform
      
      * Remove redundant return from disable_signal_handler() function
      
      * Add detailed messages to en_doc
      dd533dd3
  6. 12 8月, 2021 1 次提交
  7. 03 8月, 2021 1 次提交
  8. 09 4月, 2021 1 次提交
    • L
      [NPU] cherry-pick basic NPU components/allocator/operator/executor supports from ascendrc (#32144) · ccf5709d
      Leo Chen 提交于
      * [feature] support npu allocator (#30840)
      
      [feature] support npu allocator
      
      * [feature] support npu operator (#30951)
      
      [feature] support npu operator
      
      * [feature] support npu allocator, part 2 (#30972)
      
      * support npu allocator
      
      * add npu device context
      
      * fix some compile problem
      
      * fix some compile problem
      
      * add npu info
      
      * compile ok
      
      * fix include dir
      
      * support naive_best_fit_allocator
      
      * run ut ok, bug failed to exit
      
      * call aclrtResetDevice before exit
      
      * fix aclFinilize
      
      * add system allocatot test
      
      * add selected_gpus in gtest
      
      * add tensor_test for npu
      
      * support npu op, initial commit
      
      * add npu stream
      
      * add elementwise_add_op
      
      * compile ok
      
      * fix typo
      
      * fix elementwise_add_op_npu_test
      
      * support op run
      
      * test can run but failed
      
      * change aclopExecuteV2 to aclopCompileAndExecute
      
      * support parsing ascend rank table file (#31000)
      
      support parsing ascend rank table file
      
      * Fix reshape on GE graph. (#31084)
      
      Fix reshape on GE graph
      
      * add npu kernel for elementwise_sub and elementwise_sub_grad (#30973)
      
      * add npu sub op
      
      * fix typo
      
      * rename test
      
      * fix bug
      
      * fix bug
      
      * add fp16 kernel
      
      * fix typo
      
      * support sub grad op
      
      * support elementwise_sub_grad op
      Co-authored-by: Nfrankwhzhang <frankwhzhang@126.com>
      
      * Fix compilation problem (#31100)
      
      Fix compilation problem (#31100)
      
      * fix compile
      
      * fix code stype
      
      * remove const_cast
      
      * support adding correct npu op in pybind.h (#31143)
      
      * support adding correct npu op in pybind.h
      
      * refine code
      
      * [NPU] Support executor with NPU (#31057)
      
      * [NPU] Support executor with NPU
      
      * Fix code according to reviews
      
      * Fix code
      
      * Add unittest for sub op npu
      
      * refactor npu device manager (#31154)
      
      refactor npu device manager (#31154)
      
      * fix selected npus
      
      * fix compile
      
      * fix reading flags from env
      
      * format
      Co-authored-by: Nxiayanming <41795079@qq.com>
      Co-authored-by: Ngongweibao <weibao.gong@gmail.com>
      Co-authored-by: Nfrankwhzhang <frankwhzhang@126.com>
      Co-authored-by: Nliym27 <33742067+liym27@users.noreply.github.com>
      ccf5709d
  9. 22 2月, 2021 1 次提交
  10. 04 2月, 2021 1 次提交
  11. 15 1月, 2021 1 次提交
  12. 17 12月, 2020 1 次提交
  13. 20 11月, 2020 1 次提交
  14. 04 11月, 2020 1 次提交
  15. 30 10月, 2020 1 次提交
  16. 21 8月, 2020 1 次提交
    • Q
      support Baidu Kunlun AI Accelerator (#25959) · 138ecf24
      QingshuChen 提交于
      * support Baidu AI Accelerator
        * test=kunlun
      
      * minor
       * test=kunlun
      
      * support xpu op in separate file
       * test=kunlun
      
      * update XPU error message and remove duplicated code
      
       * test=kunlun
      
      * minor
       * test=kunlun
      
      * minor
       * test=kunlun
      138ecf24
  17. 04 8月, 2020 1 次提交
  18. 29 7月, 2020 2 次提交
  19. 15 7月, 2020 1 次提交
  20. 07 7月, 2020 1 次提交
  21. 03 6月, 2020 1 次提交
  22. 01 6月, 2020 1 次提交
  23. 19 5月, 2020 1 次提交
  24. 29 4月, 2020 1 次提交
  25. 04 4月, 2020 1 次提交
    • L
      Dev/fix init flags (#23465) · f297a332
      Leo Chen 提交于
      * fix init_gflags with 'python -c', test=develop
      
      * add test, test=develop
      
      * use sys.executable instead of python, test=develop
      
      * keep dummy, test=develop
      f297a332
  26. 05 12月, 2019 1 次提交
  27. 04 12月, 2019 1 次提交
  28. 03 12月, 2019 1 次提交
  29. 18 10月, 2019 1 次提交
  30. 11 9月, 2019 1 次提交
    • H
      Replace TemporaryAllocator by CUDADeviceContextAllocator (#18989) · 12542320
      Huihuang Zheng 提交于
      TemporaryAllocator is a singleton used for allocating memory for Cudnn. Since it is a singleton, we can delete it for better performance in memory.
      
      We replace TemporaryAllocator by CUDADeviceContextAllocator and CUDADeviceContextAllocation, which uses stream callback to delete the memory allocated for the stream to avoid singleton.
      
      Also added data_feed_proto to operator to fix CI in CPU compilation
      12542320
  31. 30 8月, 2019 2 次提交
  32. 28 8月, 2019 1 次提交
  33. 16 8月, 2019 1 次提交
  34. 04 7月, 2019 1 次提交
  35. 05 6月, 2019 1 次提交
  36. 18 4月, 2019 1 次提交
  37. 28 3月, 2019 1 次提交
  38. 15 3月, 2019 1 次提交
    • Q
      Support sync batch norm. (#16121) · 8ad672a2
      qingqing01 提交于
      * Support Sync Batch Norm.
      * Note, do not enable it in one device.
      
      Usage:
      
      build_strategy = fluid.BuildStrategy()
      build_strategy.sync_batch_norm = True
      binary = fluid.compiler.CompiledProgram(tp).with_data_parallel(
              loss_name=loss_mean.name,
              build_strategy=build_strategy)
      8ad672a2